MergeLM by yule-BUAA

Codebase for merging language models via parameter averaging

Created 2 years ago

862 stars

Top 41.6% on SourcePulse

View on GitHub

4 Experts Love This Project

Johannes Hagemann

Cofounder of Prime Intellect

Wing Lian

Founder of Axolotl AI

Yaowei Zheng

Author of LLaMA-Factory

Ying Sheng

Coauthor of SGLang

Project Summary

This repository provides a codebase for merging language models, enabling the combination of capabilities from multiple fine-tuned models into a single, more versatile model without retraining. It is targeted at researchers and practitioners working with large language models who want to efficiently aggregate diverse skills. The primary benefit is the ability to create powerful, multi-talented models with minimal computational cost.

How It Works

The core innovation is the DARE (Directly Assigning Residuals to Zero) operation, which sparsifies parameter differences between fine-tuned models by setting a high percentage (up to 99%) of delta parameters to zero. This sparsified representation is then merged with other models using parameter averaging or other established techniques like Task Arithmetic. This approach leverages the observation that fine-tuning often results in sparse parameter updates, allowing for efficient consolidation of abilities.

Quick Start & Requirements

Install: Clone the repository and install dependencies via pip install -r requirements.txt.
Prerequisites: PyTorch 2.0.1, transformers 4.33.1, datasets 2.13.1, vllm 0.1.4. Requires access to pre-trained and fine-tuned language models (e.g., LLaMA, WizardLM, WizardMath).
Resources: Merging decoder-based LMs requires N * tensor_parallel_size GPUs, where N is the number of models being merged. For example, merging two 13B models with tensor_parallel_size=1 requires 2 GPUs.
Docs: arXiv Paper, Hugging Face

Highlighted Details

DARE can preserve model capabilities while eliminating up to 99% of delta parameters.
Merged models demonstrate enhanced performance across multiple tasks (e.g., instruction following, mathematical reasoning, code generation).
Supports merging both encoder-based (BERT, RoBERTa) and decoder-based (LLaMA, Code Llama) models.
Implements five common merging methods: Average Merging, Task Arithmetic, Fisher Merging, RegMean, and TIES-Merging, with DARE integration.

Maintenance & Community

The project is associated with the ICML 2024 accepted paper "Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch."
Integrations into Hugging Face PEFT and mergekit are noted.
Contact: Le Yu (yule@buaa.edu.cn) or via GitHub Issues.

Licensing & Compatibility

The repository itself does not explicitly state a license. The associated paper is published by PMLR. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The effectiveness of DARE is noted to be dependent on the delta parameter value range of the fine-tuned models.
Specific instructions are provided for handling potential vllm data parallel group initialization errors.
Evaluation for certain tasks (AlpacaEval, HumanEval, MBPP) requires separate installation and execution of external evaluation harnesses.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days