SPIN  by uclaml

Self-Play Fine-Tuning (SPIN) research paper implementation

Created 1 year ago
1,198 stars

Top 32.6% on SourcePulse

GitHubView on GitHub
Project Summary

Self-Play Fine-Tuning (SPIN) offers a method for LLMs to improve by generating their own training data through self-play, eliminating the need for extensive human-annotated preference data beyond an initial SFT dataset. This approach is designed for researchers and practitioners aiming to enhance LLM performance efficiently.

How It Works

SPIN refines an LLM by having it generate responses to prompts and then discerning these self-generated outputs against the original SFT data. This iterative process theoretically aligns the LLM with the target data distribution, empirically outperforming methods like DPO on benchmark datasets.

Quick Start & Requirements

  • Install dependencies: conda create -n myenv python=3.10, conda activate myenv, pip install ., pip install flash-attn --no-build-isolation.
  • Hugging Face CLI login required for model downloads.
  • Official quick-start guide and reproduction scripts are available.

Highlighted Details

  • Achieves comparable performance to DPO with 62k data at iteration 0, surpassing DPO at iteration 1.
  • Supports faster generation using vLLM.
  • Provides pre-trained model checkpoints for all four iterations.
  • Offers detailed scripts for reproducing results across all iterations.

Maintenance & Community

  • Project accepted into ICML2024.
  • Code open-sourced in February 2024.
  • Built upon "The Alignment Handbook".

Licensing & Compatibility

  • No explicit license mentioned in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README notes that Alignment Handbook configurations and SFT checkpoints have been updated since the experiments; users must use specific older revisions or generate their own data if using newer SFT models. Data generation can be time-consuming, and smaller frac_len values are recommended to avoid crashes.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Philipp Schmid Philipp Schmid(DevRel at Google DeepMind), and
2 more.

t-zero by bigscience-workshop

0%
462
Codebase for training, evaluation, and inference of the T0 model
Created 3 years ago
Updated 2 years ago
Feedback? Help us improve.