evolutionary-model-merge  by SakanaAI

Model merging via evolutionary optimization research

Created 1 year ago
1,364 stars

Top 29.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code and models for SakanaAI's Evolutionary Model Merge series, focusing on optimizing model merging recipes for improved performance. It targets researchers and developers working with large language and vision-language models, offering a method to create superior merged models from existing ones.

How It Works

The project employs an evolutionary optimization approach to discover effective model merging strategies. It iteratively merges base models using a defined recipe and evaluates the resulting merged model against specific benchmarks. The process then selects and combines the best-performing merges to generate the next generation of recipes, aiming to discover optimal merging configurations that outperform individual source models.

Quick Start & Requirements

  • Install: pip install -e .
  • Prerequisites: Python 3.10.12, CUDA 12.3. Requires downloading the lid.176.ftz fastext model.
  • Evaluation: python evaluate.py --config_path {path-to-config}
  • More Info: Models, Demo, Paper, Blog

Highlighted Details

  • EvoLLM-JP-v1-7B achieves 52.0 accuracy on MGSM-JA and 70.5 on lm-eval-harness, significantly outperforming source models like Shisa Gamma 7B v1 (9.6, 66.1) and WizardMath 7B V1.1 (18.4, 60.1).
  • EvoVLM-JP-v1-7B shows strong performance in vision-language tasks, achieving 19.70 ROUGE-L on JA-VG-VQA-500 and 51.25 ROUGE-L on JA-VLM-Bench-In-the-Wild.
  • Offers models with both Microsoft Research License and Apache 2.0 licenses.

Maintenance & Community

  • Active development with ongoing updates and additions planned.
  • Twitter

Licensing & Compatibility

  • Models are available under Microsoft Research License and Apache 2.0. The Apache 2.0 license is permissive for commercial use.

Limitations & Caveats

The project explicitly states that tests were conducted with Python 3.10.12 and CUDA 12.3, and compatibility is not guaranteed in other environments.

Health Check
Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.