evolutionary-model-merge  by SakanaAI

Model merging via evolutionary optimization research

created 1 year ago
1,353 stars

Top 30.3% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides code and models for SakanaAI's Evolutionary Model Merge series, focusing on optimizing model merging recipes for improved performance. It targets researchers and developers working with large language and vision-language models, offering a method to create superior merged models from existing ones.

How It Works

The project employs an evolutionary optimization approach to discover effective model merging strategies. It iteratively merges base models using a defined recipe and evaluates the resulting merged model against specific benchmarks. The process then selects and combines the best-performing merges to generate the next generation of recipes, aiming to discover optimal merging configurations that outperform individual source models.

Quick Start & Requirements

  • Install: pip install -e .
  • Prerequisites: Python 3.10.12, CUDA 12.3. Requires downloading the lid.176.ftz fastext model.
  • Evaluation: python evaluate.py --config_path {path-to-config}
  • More Info: Models, Demo, Paper, Blog

Highlighted Details

  • EvoLLM-JP-v1-7B achieves 52.0 accuracy on MGSM-JA and 70.5 on lm-eval-harness, significantly outperforming source models like Shisa Gamma 7B v1 (9.6, 66.1) and WizardMath 7B V1.1 (18.4, 60.1).
  • EvoVLM-JP-v1-7B shows strong performance in vision-language tasks, achieving 19.70 ROUGE-L on JA-VG-VQA-500 and 51.25 ROUGE-L on JA-VLM-Bench-In-the-Wild.
  • Offers models with both Microsoft Research License and Apache 2.0 licenses.

Maintenance & Community

  • Active development with ongoing updates and additions planned.
  • Twitter

Licensing & Compatibility

  • Models are available under Microsoft Research License and Apache 2.0. The Apache 2.0 license is permissive for commercial use.

Limitations & Caveats

The project explicitly states that tests were conducted with Python 3.10.12 and CUDA 12.3, and compatibility is not guaranteed in other environments.

Health Check
Last commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
39 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Woosuk Kwon Woosuk Kwon(Author of vLLM), and
11 more.

WizardLM by nlpxucan

0.1%
9k
LLMs built using Evol-Instruct for complex instruction following
created 2 years ago
updated 1 month ago
Feedback? Help us improve.