mergoo  by Leeroo-AI

Library for merging LLM experts and training merged models

created 1 year ago
489 stars

Top 64.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Mergoo is a Python library designed for merging multiple Large Language Model (LLM) experts into a single, more capable model. It targets researchers and developers looking to efficiently combine specialized LLMs, such as domain-specific or instruction-tuned models, to create unified, powerful architectures. The primary benefit is the ability to leverage diverse LLM capabilities without the complexity of managing separate models.

How It Works

Mergoo supports several merging strategies: Mixture-of-Experts (MoE) and Mixture-of-Adapters (MoA), along with layer-wise merging. For MoE, it injects router layers into specified transformer blocks, directing token processing to different "expert" models. MoA applies this concept to LoRA adapters, creating a mixture of fine-tuned weights. This approach allows for modular integration of expert knowledge and offers flexibility in training, enabling users to fine-tune only the router or the entire merged model.

Quick Start & Requirements

  • Installation: pip install mergoo or pip install git+https://github.com/Leeroo-AI/mergoo
  • Prerequisites: Python, PyTorch. Supports Llama, Mistral, Phi3, and BERT base models. Compatible with Hugging Face Trainer, SFTrainer, and PEFT.
  • Resources: Requires sufficient VRAM for the base models and merged experts. Specific requirements depend on model size and merging strategy.
  • Docs: Notebooks provide detailed examples.

Highlighted Details

  • Supports Mixture-of-Experts (MoE) and Mixture-of-Adapters (MoA) merging.
  • Flexible layer selection for router injection.
  • Compatible with Llama (including Llama3), Mistral, and Phi3 architectures.
  • Enables training of only routers or full fine-tuning of the merged model.

Maintenance & Community

The project is actively developed with a roadmap indicating planned features like router load balancing and support for more model architectures (Gemma, Mamba) and techniques (flash-attention). Community engagement is encouraged via GitHub Issues, email, Twitter, LinkedIn, and Discord.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial use and ensure compatibility with their existing closed-source projects.

Limitations & Caveats

The README mentions that router/gate layers are untrained initially, leading to weight loading warnings. Features like lazy tensor loading for memory efficiency and support for additional merging methods (e.g., Mergekit) are still under development.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
18 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.