MergeLM  by yule-BUAA

Codebase for merging language models via parameter averaging

Created 1 year ago
850 stars

Top 42.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a codebase for merging language models, enabling the combination of capabilities from multiple fine-tuned models into a single, more versatile model without retraining. It is targeted at researchers and practitioners working with large language models who want to efficiently aggregate diverse skills. The primary benefit is the ability to create powerful, multi-talented models with minimal computational cost.

How It Works

The core innovation is the DARE (Directly Assigning Residuals to Zero) operation, which sparsifies parameter differences between fine-tuned models by setting a high percentage (up to 99%) of delta parameters to zero. This sparsified representation is then merged with other models using parameter averaging or other established techniques like Task Arithmetic. This approach leverages the observation that fine-tuning often results in sparse parameter updates, allowing for efficient consolidation of abilities.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies via pip install -r requirements.txt.
  • Prerequisites: PyTorch 2.0.1, transformers 4.33.1, datasets 2.13.1, vllm 0.1.4. Requires access to pre-trained and fine-tuned language models (e.g., LLaMA, WizardLM, WizardMath).
  • Resources: Merging decoder-based LMs requires N * tensor_parallel_size GPUs, where N is the number of models being merged. For example, merging two 13B models with tensor_parallel_size=1 requires 2 GPUs.
  • Docs: arXiv Paper, Hugging Face

Highlighted Details

  • DARE can preserve model capabilities while eliminating up to 99% of delta parameters.
  • Merged models demonstrate enhanced performance across multiple tasks (e.g., instruction following, mathematical reasoning, code generation).
  • Supports merging both encoder-based (BERT, RoBERTa) and decoder-based (LLaMA, Code Llama) models.
  • Implements five common merging methods: Average Merging, Task Arithmetic, Fisher Merging, RegMean, and TIES-Merging, with DARE integration.

Maintenance & Community

  • The project is associated with the ICML 2024 accepted paper "Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch."
  • Integrations into Hugging Face PEFT and mergekit are noted.
  • Contact: Le Yu (yule@buaa.edu.cn) or via GitHub Issues.

Licensing & Compatibility

  • The repository itself does not explicitly state a license. The associated paper is published by PMLR. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The effectiveness of DARE is noted to be dependent on the delta parameter value range of the fine-tuned models.
  • Specific instructions are provided for handling potential vllm data parallel group initialization errors.
  • Evaluation for certain tasks (AlpacaEval, HumanEval, MBPP) requires separate installation and execution of external evaluation harnesses.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Georgi Gerganov Georgi Gerganov(Author of llama.cpp, whisper.cpp), Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), and
13 more.

Qwen3 by QwenLM

0.4%
25k
Large language model series by Qwen team, Alibaba Cloud
Created 1 year ago
Updated 2 weeks ago
Feedback? Help us improve.