Codebase for merging language models via parameter averaging
Top 43.2% on sourcepulse
This repository provides a codebase for merging language models, enabling the combination of capabilities from multiple fine-tuned models into a single, more versatile model without retraining. It is targeted at researchers and practitioners working with large language models who want to efficiently aggregate diverse skills. The primary benefit is the ability to create powerful, multi-talented models with minimal computational cost.
How It Works
The core innovation is the DARE (Directly Assigning Residuals to Zero) operation, which sparsifies parameter differences between fine-tuned models by setting a high percentage (up to 99%) of delta parameters to zero. This sparsified representation is then merged with other models using parameter averaging or other established techniques like Task Arithmetic. This approach leverages the observation that fine-tuning often results in sparse parameter updates, allowing for efficient consolidation of abilities.
Quick Start & Requirements
pip install -r requirements.txt
.N * tensor_parallel_size
GPUs, where N is the number of models being merged. For example, merging two 13B models with tensor_parallel_size=1
requires 2 GPUs.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
vllm
data parallel group initialization errors.1 year ago
1 day