SPIN by uclaml

Self-Play Fine-Tuning (SPIN) research paper implementation

Created 1 year ago

1,228 stars

Top 31.9% on SourcePulse

View on GitHub

5 Experts Love This Project

Alexander Wettig

Coauthor of SWE-bench, SWE-agent

Lewis Tunstall

Research Engineer at Hugging Face

Pawel Garbacki

Cofounder of Fireworks AI

Yaowei Zheng

Author of LLaMA-Factory

and 1 more!

Project Summary

Self-Play Fine-Tuning (SPIN) offers a method for LLMs to improve by generating their own training data through self-play, eliminating the need for extensive human-annotated preference data beyond an initial SFT dataset. This approach is designed for researchers and practitioners aiming to enhance LLM performance efficiently.

How It Works

SPIN refines an LLM by having it generate responses to prompts and then discerning these self-generated outputs against the original SFT data. This iterative process theoretically aligns the LLM with the target data distribution, empirically outperforming methods like DPO on benchmark datasets.

Quick Start & Requirements

Install dependencies: conda create -n myenv python=3.10, conda activate myenv, pip install ., pip install flash-attn --no-build-isolation.
Hugging Face CLI login required for model downloads.
Official quick-start guide and reproduction scripts are available.

Highlighted Details

Achieves comparable performance to DPO with 62k data at iteration 0, surpassing DPO at iteration 1.
Supports faster generation using vLLM.
Provides pre-trained model checkpoints for all four iterations.
Offers detailed scripts for reproducing results across all iterations.

Maintenance & Community

Project accepted into ICML2024.
Code open-sourced in February 2024.
Built upon "The Alignment Handbook".

Licensing & Compatibility

No explicit license mentioned in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README notes that Alignment Handbook configurations and SFT checkpoints have been updated since the experiments; users must use specific older revisions or generate their own data if using newer SFT models. Data generation can be time-consuming, and smaller frac_len values are recommended to avoid crashes.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days