mergekit  by arcee-ai

CLI tool for merging pretrained language models, combining strengths without retraining

Created 2 years ago
6,295 stars

Top 8.2% on SourcePulse

GitHubView on GitHub
Project Summary

Mergekit is a toolkit for merging pre-trained large language models, enabling users to combine the strengths of different models without the computational overhead of ensembling or additional training. It is designed for researchers and practitioners who want to create more versatile and performant models by operating directly in the weight space, even in resource-constrained environments.

How It Works

Mergekit employs an out-of-core approach, allowing it to handle large models with limited RAM and VRAM. It supports a wide array of merging algorithms, including linear interpolation, SLERP, TIES, DARE, and task arithmetic, among others. The toolkit offers fine-grained control over the merging process through YAML configuration files, allowing users to specify which models to merge, how to slice their layers, and how to handle parameters, tokenizers, and chat templates.

Quick Start & Requirements

  • Install via pip: git clone https://github.com/arcee-ai/mergekit.git && cd mergekit && pip install -e .
  • Requires Python 3.12+ if setup.py is not found.
  • Can run on CPU or with as little as 8 GB VRAM.
  • Official documentation and examples are available in the repository.

Highlighted Details

  • Supports numerous merge methods including linear, SLERP, TIES, DARE, Task Arithmetic, and Mixture of Experts.
  • Offers advanced features like LoRA extraction and piecewise assembly of models ("Frankenmerging").
  • Provides flexible tokenizer and chat template configuration for merged models.
  • Includes an optional GUI via Arcee App and Hugging Face Spaces.

Maintenance & Community

The project is actively maintained by arcee-ai. Further details on community engagement and roadmaps are not explicitly detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README text. Users should verify licensing for commercial use or closed-source linking.

Limitations & Caveats

The README mentions a potential installation issue if setup.py is missing, requiring an upgraded pip version. Specific compatibility notes for various model architectures beyond those listed (Llama, Mistral, GPT-NeoX, StableLM) are not detailed.

Health Check
Last Commit

23 hours ago

Responsiveness

1 week

Pull Requests (30d)
3
Issues (30d)
6
Star History
104 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Nir Gazit Nir Gazit(Cofounder of Traceloop), and
4 more.

llmware by llmware-ai

0.6%
14k
Framework for enterprise RAG pipelines using small, specialized models
Created 2 years ago
Updated 1 month ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Magnus Müller Magnus Müller(Cofounder of Browser Use), and
83 more.

langchain by langchain-ai

0.4%
116k
Framework for building LLM-powered applications
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.