es-fine-tuning-paper by VsonicV

Evolution Strategies for LLM Fine-Tuning

Created 5 months ago

320 stars

Top 85.0% on SourcePulse

Project Summary

This repository provides the source code for the paper "Evolution Strategies at Scale: LLM Fine-Tuning Beyond Reinforcement Learning." It addresses the challenge of fine-tuning large language models (LLMs) by employing Evolution Strategies (ES) to directly optimize billions of parameters, offering a novel alternative to traditional reinforcement learning methods. The project targets researchers and engineers seeking to scale LLM optimization efficiently.

How It Works

The core innovation lies in applying Evolution Strategies (ES) for direct LLM parameter optimization. Instead of relying on gradient-based methods or reinforcement learning reward signals, ES treats model fine-tuning as a search problem. It iteratively generates populations of model variants, evaluates their performance on specific tasks, and uses the results to guide the evolution of better-performing models. This approach is designed to scale to models with billions of parameters, potentially offering a more direct and computationally tractable path to fine-tuning compared to complex RL setups.

Quick Start & Requirements

Installation: Set up a Python environment (version >= 3.10) and activate it. From the repository root, install dependencies using pip install -r requirement.txt.
Accelerated Version Dependencies: For the faster implementation, additionally install vllm==0.11.0 and tensorboard.
Prerequisites: Python 3.10+, the accelerate library, and GPU hardware are necessary for running the fine-tuning scripts.
Documentation: The associated research paper is available at https://arxiv.org/abs/2509.24372.

Highlighted Details

An accelerated version of the code achieves over 10x speed-up in running time, making large-scale ES fine-tuning more practical.
The project enables direct optimization of billions of LLM parameters using ES, presenting a distinct paradigm from RL-based fine-tuning.
It offers distinct scripts for fine-tuning using partially correlated noise and complete i.i.d. noise, allowing for experimentation with different ES noise strategies.

Maintenance & Community

The repository is under active development, with ongoing additions of experimental code, and users should anticipate potential breaking changes. A community forum for ES fine-tuning is available in the Discussions section.

Licensing & Compatibility

The provided README does not specify a software license. Consequently, the terms for commercial use, redistribution, or integration into closed-source projects remain undefined.

Limitations & Caveats

The accelerated implementations are noted to be subject to breaking changes due to ongoing development. The project is still actively incorporating experimental code, indicating a potentially evolving API and feature set.

es-fine-tuning-paper by VsonicV

Explore Similar Projects

from-minimind-to-more by Tongyun1

LLM-Optimizers-Papers by AGI-Edgerunners

EvolKit by arcee-ai

Awesome-GRPO by WangJingyao07

Finetune_LLAMA by chaoyi-wu

tiny-grpo by open-thought

discover by test-time-training

MeZO by princeton-nlp

LLMSys-PaperList by AmberLJC

llm-finetuning by modal-labs

xtuner by InternLM

tensorzero by tensorzero