mergoo by Leeroo-AI

Library for merging LLM experts and training merged models

Created 1 year ago

500 stars

Top 62.2% on SourcePulse

View on GitHub

3 Experts Love This Project

Clement Delangue

Cofounder of Hugging Face

Philipp Schmid

DevRel at Google DeepMind

Maxime Labonne

Head of Post-Training at Liquid AI

Project Summary

Mergoo is a Python library designed for merging multiple Large Language Model (LLM) experts into a single, more capable model. It targets researchers and developers looking to efficiently combine specialized LLMs, such as domain-specific or instruction-tuned models, to create unified, powerful architectures. The primary benefit is the ability to leverage diverse LLM capabilities without the complexity of managing separate models.

How It Works

Mergoo supports several merging strategies: Mixture-of-Experts (MoE) and Mixture-of-Adapters (MoA), along with layer-wise merging. For MoE, it injects router layers into specified transformer blocks, directing token processing to different "expert" models. MoA applies this concept to LoRA adapters, creating a mixture of fine-tuned weights. This approach allows for modular integration of expert knowledge and offers flexibility in training, enabling users to fine-tune only the router or the entire merged model.

Quick Start & Requirements

Installation: pip install mergoo or pip install git+https://github.com/Leeroo-AI/mergoo
Prerequisites: Python, PyTorch. Supports Llama, Mistral, Phi3, and BERT base models. Compatible with Hugging Face Trainer, SFTrainer, and PEFT.
Resources: Requires sufficient VRAM for the base models and merged experts. Specific requirements depend on model size and merging strategy.
Docs: Notebooks provide detailed examples.

Highlighted Details

Supports Mixture-of-Experts (MoE) and Mixture-of-Adapters (MoA) merging.
Flexible layer selection for router injection.
Compatible with Llama (including Llama3), Mistral, and Phi3 architectures.
Enables training of only routers or full fine-tuning of the merged model.

Maintenance & Community

The project is actively developed with a roadmap indicating planned features like router load balancing and support for more model architectures (Gemma, Mamba) and techniques (flash-attention). Community engagement is encouraged via GitHub Issues, email, Twitter, LinkedIn, and Discord.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial use and ensure compatibility with their existing closed-source projects.

Limitations & Caveats

The README mentions that router/gate layers are untrained initially, leading to weight loading warnings. Features like lazy tensor loading for memory efficiency and support for additional merging methods (e.g., Mergekit) are still under development.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days