MergeLM  by yule-BUAA

Codebase for merging language models via parameter averaging

created 1 year ago
842 stars

Top 43.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a codebase for merging language models, enabling the combination of capabilities from multiple fine-tuned models into a single, more versatile model without retraining. It is targeted at researchers and practitioners working with large language models who want to efficiently aggregate diverse skills. The primary benefit is the ability to create powerful, multi-talented models with minimal computational cost.

How It Works

The core innovation is the DARE (Directly Assigning Residuals to Zero) operation, which sparsifies parameter differences between fine-tuned models by setting a high percentage (up to 99%) of delta parameters to zero. This sparsified representation is then merged with other models using parameter averaging or other established techniques like Task Arithmetic. This approach leverages the observation that fine-tuning often results in sparse parameter updates, allowing for efficient consolidation of abilities.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt.
  • Prerequisites: PyTorch 2.0.1, transformers 4.33.1, datasets 2.13.1, vllm 0.1.4. Requires access to pre-trained and fine-tuned language models (e.g., LLaMA, WizardLM, WizardMath).
  • Resources: Merging decoder-based LMs requires N * tensor_parallel_size GPUs, where N is the number of models being merged. For example, merging two 13B models with tensor_parallel_size=1 requires 2 GPUs.
  • Docs: arXiv Paper, Hugging Face

Highlighted Details

  • DARE can preserve model capabilities while eliminating up to 99% of delta parameters.
  • Merged models demonstrate enhanced performance across multiple tasks (e.g., instruction following, mathematical reasoning, code generation).
  • Supports merging both encoder-based (BERT, RoBERTa) and decoder-based (LLaMA, Code Llama) models.
  • Implements five common merging methods: Average Merging, Task Arithmetic, Fisher Merging, RegMean, and TIES-Merging, with DARE integration.

Maintenance & Community

  • The project is associated with the ICML 2024 accepted paper "Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch."
  • Integrations into Hugging Face PEFT and mergekit are noted.
  • Contact: Le Yu (yule@buaa.edu.cn) or via GitHub Issues.

Licensing & Compatibility

  • The repository itself does not explicitly state a license. The associated paper is published by PMLR. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The effectiveness of DARE is noted to be dependent on the delta parameter value range of the fine-tuned models.
  • Specific instructions are provided for handling potential vllm data parallel group initialization errors.
  • Evaluation for certain tasks (AlpacaEval, HumanEval, MBPP) requires separate installation and execution of external evaluation harnesses.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
25 stars in the last 90 days

Explore Similar Projects

Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

Medusa by FasterDecoding

0.2%
3k
Framework for accelerating LLM generation using multiple decoding heads
created 1 year ago
updated 1 year ago
Feedback? Help us improve.