mergekit by arcee-ai

CLI tool for merging pretrained language models, combining strengths without retraining

Created 2 years ago

6,674 stars

Top 7.6% on SourcePulse

View on GitHub

20 Experts Love This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Cofounder of Hugging Face

and 16 more!

Project Summary

Mergekit is a toolkit for merging pre-trained large language models, enabling users to combine the strengths of different models without the computational overhead of ensembling or additional training. It is designed for researchers and practitioners who want to create more versatile and performant models by operating directly in the weight space, even in resource-constrained environments.

How It Works

Mergekit employs an out-of-core approach, allowing it to handle large models with limited RAM and VRAM. It supports a wide array of merging algorithms, including linear interpolation, SLERP, TIES, DARE, and task arithmetic, among others. The toolkit offers fine-grained control over the merging process through YAML configuration files, allowing users to specify which models to merge, how to slice their layers, and how to handle parameters, tokenizers, and chat templates.

Quick Start & Requirements

Install via pip: git clone https://github.com/arcee-ai/mergekit.git && cd mergekit && pip install -e .
Requires Python 3.12+ if setup.py is not found.
Can run on CPU or with as little as 8 GB VRAM.
Official documentation and examples are available in the repository.

Highlighted Details

Supports numerous merge methods including linear, SLERP, TIES, DARE, Task Arithmetic, and Mixture of Experts.
Offers advanced features like LoRA extraction and piecewise assembly of models ("Frankenmerging").
Provides flexible tokenizer and chat template configuration for merged models.
Includes an optional GUI via Arcee App and Hugging Face Spaces.

Maintenance & Community

The project is actively maintained by arcee-ai. Further details on community engagement and roadmaps are not explicitly detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README text. Users should verify licensing for commercial use or closed-source linking.

Limitations & Caveats

The README mentions a potential installation issue if setup.py is missing, requiring an upgraded pip version. Specific compatibility notes for various model architectures beyond those listed (Llama, Mistral, GPT-NeoX, StableLM) are not detailed.

Health Check

Last Commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

107 stars in the last 30 days