RLHF-Reward-Modeling  by RLHFlow

Recipes to train reward models for RLHF

Created 1 year ago
1,449 stars

Top 28.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive suite of recipes and code for training reward models (RMs) essential for Reinforcement Learning from Human Feedback (RLHF). It caters to researchers and practitioners in LLM alignment, offering implementations of various RM techniques, including Bradley-Terry, pairwise preference, semi-supervised, multi-objective (ArmoRM), and process/outcome-supervised methods. The project aims to facilitate reproducible and state-of-the-art reward modeling for RLHF pipelines.

How It Works

The project implements diverse reward modeling strategies, including the classic Bradley-Terry model, pairwise preference models that directly predict preference probabilities, and generative RMs that leverage next-token prediction. It also incorporates advanced techniques like Semi-Supervised Reward Modeling (SSRM) for data augmentation, ArmoRM for multi-objective rewards with context-dependent aggregation, and math-rm for process/outcome-supervised rewards. Decision-tree RMs are also included for interpretable preference modeling.

Quick Start & Requirements

  • Installation: Separate environments are recommended for different models. Instructions are provided within respective model folders (e.g., bradley-terry-rm, pair-pm).
  • Prerequisites: Requires Python and standard ML libraries. Specific hardware (e.g., 4x A40 48G or 4x A100 80G) is mentioned for training larger models with specific configurations like DeepSpeed Zero-3 and gradient checkpointing.
  • Data Format: Expects preference data with 'chosen' and 'rejected' conversations sharing the same prompt. Preprocessed datasets are available on Hugging Face.
  • Links:

Highlighted Details

  • Achieves state-of-the-art scores on RewardBench with models like ArmoRM-Llama3-8B-v0.1 (89.0) and Decision-Tree-Reward-Gemma-2-27B (95.4%).
  • Provides code for multiple RM architectures: Bradley-Terry, Pairwise Preference, ArmoRM, SSRM, math-rm, and decision-tree RMs.
  • Includes open-sourced data, code, hyperparameters, and models for reproducibility.
  • Supports training for DRL-based RLHF (PPO), Iterative SFT, and iterative DPO.

Maintenance & Community

  • Active development with recent releases for decision-tree, PRM/ORM, and ArmoRM.
  • Models and code have been contributed to numerous academic research projects.
  • Citation information and BibTeX entries are provided.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the README. However, the models released (e.g., ArmoRM-Llama3-8B-v0.1) are subject to the base model's license (e.g., Llama 3 license). Compatibility for commercial use depends on the specific model and base LLM licenses.

Limitations & Caveats

  • The README does not specify a repository-wide license, potentially impacting commercial use.
  • Some advanced RM techniques (e.g., LLM-as-a-judge, Inverse-Q*) are listed under "To Do" or not yet implemented within the provided code structure.
Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

self-rewarding-lm-pytorch by lucidrains

0.1%
1k
Training framework for self-rewarding language models
Created 1 year ago
Updated 1 year ago
Starred by Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
2 more.

reward-bench by allenai

0%
634
Reward model evaluation tool
Created 1 year ago
Updated 3 months ago
Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

RL4LMs by allenai

0.0%
2k
RL library to fine-tune language models to human preferences
Created 3 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab).

Eureka by eureka-research

0.2%
3k
LLM-based reward design for reinforcement learning
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.