Recipe for online iterative RLHF to align LLMs
Top 61.1% on sourcepulse
This repository provides a comprehensive "recipe" for implementing online Reinforcement Learning from Human Feedback (RLHF) and iterative Direct Preference Optimization (DPO) for aligning Large Language Models (LLMs). It targets researchers and practitioners aiming to replicate state-of-the-art online RLHF workflows, offering a practical guide to achieve results comparable to or exceeding leading models like LLaMA-3-8B-Instruct using open-source components.
How It Works
The workflow follows a three-step process: Supervised Fine-Tuning (SFT) of a base LLM, training a Reward Model (RM) using pairwise preferences, and then iteratively refining the LLM using RLHF or DPO with generated data. The project emphasizes an "online" approach, suggesting continuous improvement through iterative feedback loops, which is claimed to outperform offline methods. It leverages existing frameworks like Axolotl for SFT and provides scripts for data generation, reward modeling, and the final RLHF/DPO training stages.
Quick Start & Requirements
torch==2.1.2
, flash-attn==2.6.3
, accelerate
, deepspeed
, transformers
, vllm
, wandb
, and nvidia-ml-py3
. Specific versions are recommended to avoid compatibility issues (e.g., numpy<2.0
).huggingface-cli login
.axolotl/src/axolotl/utils/bench.py
).Highlighted Details
run_loop2.sh
script to automate the entire iterative training process.Maintenance & Community
The project acknowledges contributions from various Hugging Face teams (TRL, H4), Meta LLaMA, Allen Institute AI, evalplus, and Axolotl. Community links are not explicitly provided in the README.
Licensing & Compatibility
The repository itself does not specify a license. However, it heavily relies on and integrates with other projects, including Hugging Face libraries and models, which have their own licenses. Compatibility for commercial use would depend on the licenses of the underlying models and frameworks used.
Limitations & Caveats
numpy
and torch
are noted, indicating potential compatibility challenges.7 months ago
1 day