alpaca_farm  by tatsu-lab

RLHF simulation framework for accessible instruction-following/alignment research

Created 2 years ago
826 stars

Top 42.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides AlpacaFarm, a simulation framework for developing and evaluating methods that learn from human feedback, such as RLHF. It targets researchers and developers in NLP and AI alignment, enabling them to iterate on feedback-based learning algorithms without the cost and complexity of collecting real human data.

How It Works

AlpacaFarm simulates pairwise preference data using large language models (like GPT-4) as automated annotators, mimicking human judgment with added noise for realism. It offers automated evaluation pipelines and reference implementations of key algorithms (PPO, Best-of-N, DPO, Expert Iteration) for instruction following and alignment research. This approach significantly reduces the cost and effort associated with developing these methods.

Quick Start & Requirements

Highlighted Details

  • Supports simulation of preference data using GPT-4 and other LLMs.
  • Provides reference implementations for SFT, Reward Modeling, PPO, Best-of-N, Expert Iteration, Quark, and DPO.
  • Includes automated evaluation for benchmarking models against AlpacaEval.
  • Offers pre-trained checkpoints for various methods trained on simulated and human preferences.

Maintenance & Community

The project is associated with the Tatsu Lab at the University of Washington. The README notes a change in annotators from text-davinci-003 to GPT-4, impacting comparability with older results.

Licensing & Compatibility

The dataset and weight diffs are licensed under CC BY NC 4.0, restricting use to non-commercial, research purposes only.

Limitations & Caveats

The framework is licensed for research use only, prohibiting commercial applications. Recent results are not directly comparable to older benchmarks due to the switch to GPT-4 as the primary annotator. Training RLHF with PPO requires at least 8x 80GB A100 GPUs.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Ross Taylor Ross Taylor(Cofounder of General Reasoning; Cocreator of Papers with Code), and
11 more.

open-instruct by allenai

0.7%
3k
Training codebase for instruction-following language models
Created 2 years ago
Updated 15 hours ago
Feedback? Help us improve.