Training framework for self-rewarding language models
Top 29.6% on sourcepulse
This repository provides a PyTorch implementation of the self-rewarding language model training framework from MetaAI, along with an implementation of SPIN (Self-Play Fine-Tuning). It is designed for researchers and practitioners looking to fine-tune language models using preference data and self-generated rewards, enabling stronger model performance without external human feedback.
How It Works
The library implements training pipelines for both Self-Rewarding Language Models and SPIN. Self-rewarding models leverage a language model to generate rewards for its own responses, which are then used to fine-tune the model via a Direct Preference Optimization (DPO) approach. SPIN uses self-play to generate preference data, also fine-tuning the model with DPO. The framework is highly modular, allowing users to define custom reward prompts and orchestrate arbitrary sequences of SFT, SPIN, and self-rewarding DPO stages.
Quick Start & Requirements
pip install self-rewarding-lm-pytorch
Highlighted Details
x-transformers
for model architecture.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
1 day