RLHF simulation framework for accessible instruction-following/alignment research
Top 44.2% on sourcepulse
This repository provides AlpacaFarm, a simulation framework for developing and evaluating methods that learn from human feedback, such as RLHF. It targets researchers and developers in NLP and AI alignment, enabling them to iterate on feedback-based learning algorithms without the cost and complexity of collecting real human data.
How It Works
AlpacaFarm simulates pairwise preference data using large language models (like GPT-4) as automated annotators, mimicking human judgment with added noise for realism. It offers automated evaluation pipelines and reference implementations of key algorithms (PPO, Best-of-N, DPO, Expert Iteration) for instruction following and alignment research. This approach significantly reduces the cost and effort associated with developing these methods.
Quick Start & Requirements
pip install alpaca-farm
OPENAI_API_KEY
environment variable). For optimizations like FlashAttention, install flash-attn
and apex
.Highlighted Details
Maintenance & Community
The project is associated with the Tatsu Lab at the University of Washington. The README notes a change in annotators from text-davinci-003
to GPT-4, impacting comparability with older results.
Licensing & Compatibility
The dataset and weight diffs are licensed under CC BY NC 4.0, restricting use to non-commercial, research purposes only.
Limitations & Caveats
The framework is licensed for research use only, prohibiting commercial applications. Recent results are not directly comparable to older benchmarks due to the switch to GPT-4 as the primary annotator. Training RLHF with PPO requires at least 8x 80GB A100 GPUs.
1 year ago
1 week