evolutionary-model-merge by SakanaAI

Model merging via evolutionary optimization research

Created 1 year ago

1,394 stars

Top 28.9% on SourcePulse

3 Experts Love This Project

omarsar

Founder of DAIR.AI

philschmid

DevRel at Google DeepMind

vincentweisser

Vincent Weisser

Cofounder of Prime Intellect

Project Summary

This repository provides code and models for SakanaAI's Evolutionary Model Merge series, focusing on optimizing model merging recipes for improved performance. It targets researchers and developers working with large language and vision-language models, offering a method to create superior merged models from existing ones.

How It Works

The project employs an evolutionary optimization approach to discover effective model merging strategies. It iteratively merges base models using a defined recipe and evaluates the resulting merged model against specific benchmarks. The process then selects and combines the best-performing merges to generate the next generation of recipes, aiming to discover optimal merging configurations that outperform individual source models.

Quick Start & Requirements

Install: pip install -e .
Prerequisites: Python 3.10.12, CUDA 12.3. Requires downloading the lid.176.ftz fastext model.
Evaluation: python evaluate.py --config_path {path-to-config}
More Info: Models, Demo, Paper, Blog

Highlighted Details

EvoLLM-JP-v1-7B achieves 52.0 accuracy on MGSM-JA and 70.5 on lm-eval-harness, significantly outperforming source models like Shisa Gamma 7B v1 (9.6, 66.1) and WizardMath 7B V1.1 (18.4, 60.1).
EvoVLM-JP-v1-7B shows strong performance in vision-language tasks, achieving 19.70 ROUGE-L on JA-VG-VQA-500 and 51.25 ROUGE-L on JA-VLM-Bench-In-the-Wild.
Offers models with both Microsoft Research License and Apache 2.0 licenses.

Maintenance & Community

Active development with ongoing updates and additions planned.
Twitter

Licensing & Compatibility

Models are available under Microsoft Research License and Apache 2.0. The Apache 2.0 license is permissive for commercial use.

Limitations & Caveats

The project explicitly states that tests were conducted with Python 3.10.12 and CUDA 12.3, and compatibility is not guaranteed in other environments.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

6 stars in the last 30 days

Explore Similar Projects

llm_benchmark by llm2014

LLM evaluation benchmark tracking model evolution

Created 11 months ago

Updated 1 day ago

Mengzi3 by Langboat

LLM for multilingual generation, especially Chinese

Created 1 year ago

Updated 1 year ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory) and

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera).

mend by eric-mitchell

Fast model editing for LLMs

Created 4 years ago

Updated 2 years ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

awesome-japanese-llm by llm-jp

Japanese LLM list: models, benchmarks, datasets

Created 2 years ago

Updated 1 day ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai).

Xwin-LM by Xwin-LM

LLM for alignment research, fine-tuning, and open-source contribution

Created 2 years ago

Updated 1 year ago

Starred by

Johannes Hagemann

Johannes Hagemann(Cofounder of Prime Intellect),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

2 more.

MergeLM by yule-BUAA

Codebase for merging language models via parameter averaging

Created 2 years ago

Updated 1 year ago

Starred by

Burkay Gur

Burkay Gur(Cofounder of Fal.ai).

awesome-vlm-architectures by gokayfem

Vision-language models and their architectures

Created 1 year ago

Updated 10 months ago

Starred by

Michael Han

Michael Han(Cofounder of Unsloth) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

pruna by PrunaAI

Model optimization framework for faster, smaller, cheaper, greener AI

Created 10 months ago

Updated 1 day ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Ji Yichao

Ji Yichao(Cofounder of Manus), and

1 more.

OLMoE by allenai

Open MoE language model research paper

Created 1 year ago

Updated 3 months ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Travis Fischer

Travis Fischer(Founder of Agentic), and

2 more.

OpenELM by CarperAI

OpenELM: evolutionary search with language models in code and natural language

Created 3 years ago

Updated 2 years ago

Starred by

Zack Li

Zack Li(Cofounder of Nexa AI) and

Piero Molino

Piero Molino(Cofounder of Predibase).

Efficient-LLMs-Survey by AIoT-MLSys-Lab

Survey paper on efficient large language models

Created 2 years ago

Updated 6 months ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA),

Ying Sheng

Ying Sheng(Coauthor of SGLang), and

18 more.

mergekit by arcee-ai

CLI tool for merging pretrained language models, combining strengths without retraining

Created 2 years ago

Updated 1 week ago

Feedback? Help us improve.