Spurious_Rewards  by ruixin31

RL fine-tuning research

created 2 months ago
323 stars

Top 85.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository addresses the issue of spurious rewards in Reinforcement Learning from Human Feedback (RLHF) for Large Language Models (LLMs). It provides a framework and experimental results to demonstrate how carefully curated reward signals can improve model performance, particularly in complex reasoning tasks. The project is targeted at researchers and engineers working on LLM alignment and fine-tuning.

How It Works

The project investigates the impact of different reward functions on LLM training, proposing that "spurious rewards" (e.g., superficial formatting or irrelevant content) can mislead the learning process. It leverages the TTRL framework, building upon OpenRLHF, and introduces custom features like asynchronous evaluation. The core idea is to isolate and test specific reward signals, such as mathematical equivalence or correct Python formatting, to understand their contribution to improved reasoning capabilities.

Quick Start & Requirements

  • Install: Clone the repository, create a conda environment (conda create -n spurious-rewards python=3.10), activate it, and install requirements (pip install -r requirements.txt, pip install flash_attn==2.7.0.post2, pip install -e .).
  • Prerequisites: Python 3.10, CUDA 11.8+ (for flash_attn), and specific hardware for exact reproduction (NVIDIA A100 80GB or H200).
  • Resources: Requires significant computational resources for training and evaluation.
  • Links: Github, Website, Paper, Wandb, Models.

Highlighted Details

  • Focuses on isolating and evaluating specific reward signals for RLHF.
  • Demonstrates improved performance on reasoning tasks by mitigating spurious rewards.
  • Codebase is built on TTRL and OpenRLHF with added custom features.
  • Provides reproduction scripts for evaluation results.

Maintenance & Community

The project lists numerous academic affiliations for its authors, indicating a strong research backing. Links to Twitter and a Notion site are provided for community engagement and project information.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Exact reproduction of evaluation results requires specific, high-end GPU hardware (NVIDIA A100 80GB or H200) and matching --shards parameters due to potential generation fluctuations influenced by batch size in VLLM. The project appears to be research-oriented, and its readiness for production deployment is not detailed.

Health Check
Last commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
326 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), John Yang John Yang(Author of SWE-bench, SWE-agent), and
7 more.

tree-of-thought-llm by princeton-nlp

0.3%
5k
Research paper implementation for Tree of Thoughts (ToT) prompting
created 2 years ago
updated 6 months ago
Feedback? Help us improve.