Online-RLHF  by RLHFlow

Recipe for online iterative RLHF to align LLMs

created 1 year ago
524 stars

Top 61.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a comprehensive "recipe" for implementing online Reinforcement Learning from Human Feedback (RLHF) and iterative Direct Preference Optimization (DPO) for aligning Large Language Models (LLMs). It targets researchers and practitioners aiming to replicate state-of-the-art online RLHF workflows, offering a practical guide to achieve results comparable to or exceeding leading models like LLaMA-3-8B-Instruct using open-source components.

How It Works

The workflow follows a three-step process: Supervised Fine-Tuning (SFT) of a base LLM, training a Reward Model (RM) using pairwise preferences, and then iteratively refining the LLM using RLHF or DPO with generated data. The project emphasizes an "online" approach, suggesting continuous improvement through iterative feedback loops, which is claimed to outperform offline methods. It leverages existing frameworks like Axolotl for SFT and provides scripts for data generation, reward modeling, and the final RLHF/DPO training stages.

Quick Start & Requirements

  • Installation: Requires separate Conda environments for SFT, inference, and training. Key dependencies include torch==2.1.2, flash-attn==2.6.3, accelerate, deepspeed, transformers, vllm, wandb, and nvidia-ml-py3. Specific versions are recommended to avoid compatibility issues (e.g., numpy<2.0).
  • Prerequisites: CUDA 12.1/12.2 is tested. Access to LLaMA-3 models requires huggingface-cli login.
  • Setup: Involves cloning repositories (Axolotl, FastChat, RLHF-Reward-Modeling), installing dependencies, and potentially modifying code (e.g., axolotl/src/axolotl/utils/bench.py).
  • Resources: Training typically requires multiple GPUs (e.g., 8x GPUs for SFT).
  • Links: Axolotl, FastChat, RLHF-Reward-Modeling, Alignment Handbook.

Highlighted Details

  • Achieves competitive results against LLaMA-3-8B-Instruct using open-source data and models.
  • Provides pre-trained SFT, Reward Models (Bradley-Terry, generative pairwise, mixture-of-expert), and RLHF/DPO models on Hugging Face.
  • Includes detailed scripts for data generation using VLLM (both direct inference and API server modes) and data annotation.
  • Offers a run_loop2.sh script to automate the entire iterative training process.

Maintenance & Community

The project acknowledges contributions from various Hugging Face teams (TRL, H4), Meta LLaMA, Allen Institute AI, evalplus, and Axolotl. Community links are not explicitly provided in the README.

Licensing & Compatibility

The repository itself does not specify a license. However, it heavily relies on and integrates with other projects, including Hugging Face libraries and models, which have their own licenses. Compatibility for commercial use would depend on the licenses of the underlying models and frameworks used.

Limitations & Caveats

  • Strict version requirements for dependencies like numpy and torch are noted, indicating potential compatibility challenges.
  • The setup process involves manual code modifications and environment management, which can be complex.
  • The "online" nature implies continuous data generation and retraining, requiring significant computational resources and ongoing effort.
Health Check
Last commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 90 days

Explore Similar Projects

Starred by Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), Daniel Han Daniel Han(Cofounder of Unsloth), and
4 more.

open-instruct by allenai

0.2%
3k
Training codebase for instruction-following language models
created 2 years ago
updated 15 hours ago
Feedback? Help us improve.