mergekit  by arcee-ai

CLI tool for merging pretrained language models, combining strengths without retraining

created 1 year ago
6,122 stars

Top 8.6% on sourcepulse

GitHubView on GitHub
Project Summary

Mergekit is a toolkit for merging pre-trained large language models, enabling users to combine the strengths of different models without the computational overhead of ensembling or additional training. It is designed for researchers and practitioners who want to create more versatile and performant models by operating directly in the weight space, even in resource-constrained environments.

How It Works

Mergekit employs an out-of-core approach, allowing it to handle large models with limited RAM and VRAM. It supports a wide array of merging algorithms, including linear interpolation, SLERP, TIES, DARE, and task arithmetic, among others. The toolkit offers fine-grained control over the merging process through YAML configuration files, allowing users to specify which models to merge, how to slice their layers, and how to handle parameters, tokenizers, and chat templates.

Quick Start & Requirements

  • Install via pip: git clone https://github.com/arcee-ai/mergekit.git && cd mergekit && pip install -e .
  • Requires Python 3.12+ if setup.py is not found.
  • Can run on CPU or with as little as 8 GB VRAM.
  • Official documentation and examples are available in the repository.

Highlighted Details

  • Supports numerous merge methods including linear, SLERP, TIES, DARE, Task Arithmetic, and Mixture of Experts.
  • Offers advanced features like LoRA extraction and piecewise assembly of models ("Frankenmerging").
  • Provides flexible tokenizer and chat template configuration for merged models.
  • Includes an optional GUI via Arcee App and Hugging Face Spaces.

Maintenance & Community

The project is actively maintained by arcee-ai. Further details on community engagement and roadmaps are not explicitly detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README text. Users should verify licensing for commercial use or closed-source linking.

Limitations & Caveats

The README mentions a potential installation issue if setup.py is missing, requiring an upgraded pip version. Specific compatibility notes for various model architectures beyond those listed (Llama, Mistral, GPT-NeoX, StableLM) are not detailed.

Health Check
Last commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
8
Star History
525 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
2 more.

llmware by llmware-ai

0.2%
14k
Framework for enterprise RAG pipelines using small, specialized models
created 1 year ago
updated 1 week ago
Feedback? Help us improve.