awesome-RLHF by opendilab

Curated list of RLHF resources for language model alignment

Created 2 years ago

4,257 stars

Top 11.5% on SourcePulse

View on GitHub

3 Experts Love This Project

Vincent Weisser

Cofounder of Prime Intellect

Yaowei Zheng

Author of LLaMA-Factory

Elvis Saravia

Founder of DAIR.AI

Project Summary

This repository serves as a comprehensive, continuously updated curated list of resources for Reinforcement Learning from Human Feedback (RLHF). It targets researchers and practitioners in AI, particularly those working with Large Language Models (LLMs), by providing a structured overview of papers, codebases, and datasets in this rapidly evolving field. The primary benefit is a centralized, organized knowledge base to track the frontier of RLHF research and development.

How It Works

The repository categorizes resources chronologically and thematically, offering links to research papers, open-source codebases implementing RLHF techniques, and relevant datasets. It aims to cover the core concepts of RLHF, which involves using reinforcement learning methods to optimize models based on human feedback, thereby aligning model behavior with complex human values and preferences.

Quick Start & Requirements

Installation: No direct installation is required as this is a curated list of resources. Individual codebases or datasets will have their own installation instructions.
Prerequisites: Access to research papers may require academic subscriptions or pre-print server access (e.g., arXiv). Codebases will have varying dependencies (Python, PyTorch, TensorFlow, specific libraries). Datasets may require significant storage.
Resources: Links to official documentation, demos, and community channels are provided for individual projects.

Highlighted Details

Extensive collection of papers dating back to 2020, categorized by year.
A dedicated section for RLHF codebases, including popular frameworks like TRL, OpenRLHF, and DeepSpeed-Chat.
A comprehensive list of datasets specifically curated or used for RLHF, such as HH-RLHF and Stanford Human Preferences Dataset.
Includes links to blogs, tutorials, and surveys for deeper understanding.

Maintenance & Community

The repository is actively maintained and welcomes community contributions. Links to contribution guidelines are provided.

Licensing & Compatibility

The repository itself is released under the Apache 2.0 license. Individual resources (papers, code, datasets) will have their own respective licenses, which may include restrictions on commercial use or derivative works.

Limitations & Caveats

This is a curated list, not a runnable framework. Users must independently evaluate and integrate the individual resources. The rapid pace of RLHF research means the list requires continuous updates to remain current.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

32 stars in the last 30 days