SPIN  by uclaml

Self-Play Fine-Tuning (SPIN) research paper implementation

created 1 year ago
1,180 stars

Top 33.7% on sourcepulse

GitHubView on GitHub
Project Summary

Self-Play Fine-Tuning (SPIN) offers a method for LLMs to improve by generating their own training data through self-play, eliminating the need for extensive human-annotated preference data beyond an initial SFT dataset. This approach is designed for researchers and practitioners aiming to enhance LLM performance efficiently.

How It Works

SPIN refines an LLM by having it generate responses to prompts and then discerning these self-generated outputs against the original SFT data. This iterative process theoretically aligns the LLM with the target data distribution, empirically outperforming methods like DPO on benchmark datasets.

Quick Start & Requirements

  • Install dependencies: conda create -n myenv python=3.10, conda activate myenv, pip install ., pip install flash-attn --no-build-isolation.
  • Hugging Face CLI login required for model downloads.
  • Official quick-start guide and reproduction scripts are available.

Highlighted Details

  • Achieves comparable performance to DPO with 62k data at iteration 0, surpassing DPO at iteration 1.
  • Supports faster generation using vLLM.
  • Provides pre-trained model checkpoints for all four iterations.
  • Offers detailed scripts for reproducing results across all iterations.

Maintenance & Community

  • Project accepted into ICML2024.
  • Code open-sourced in February 2024.
  • Built upon "The Alignment Handbook".

Licensing & Compatibility

  • No explicit license mentioned in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README notes that Alignment Handbook configurations and SFT checkpoints have been updated since the experiments; users must use specific older revisions or generate their own data if using newer SFT models. Data generation can be time-consuming, and smaller frac_len values are recommended to avoid crashes.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
35 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Zhuohan Li Zhuohan Li(Author of vLLM), and
1 more.

Consistency_LLM by hao-ai-lab

0%
397
Parallel decoder for efficient LLM inference
created 1 year ago
updated 8 months ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

HALOs by ContextualAI

0.2%
873
Library for aligning LLMs using human-aware loss functions
created 1 year ago
updated 2 weeks ago
Feedback? Help us improve.