ReaLHF  by openpsi-project

Efficient RLHF training system for LLMs using parameter reallocation

Created 1 year ago
330 stars

Top 83.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides ReaLHF, a distributed system for efficient Reinforcement Learning from Human Feedback (RLHF) training of Large Language Models (LLMs). It targets researchers and engineers working on LLM alignment, offering significantly higher training throughput and memory efficiency through its novel parameter reallocation technique.

How It Works

ReaLHF employs a parameter reallocation strategy, dynamically redistributing LLM parameters and adapting parallelization across a cluster during training. This approach optimizes resource allocation for each workload, leading to superior PPO training throughput compared to existing systems, especially as model size and GPU count increase. It also supports advanced RLHF algorithms and features like CUDAGraphs for high-throughput generation.

Quick Start & Requirements

  • Install from source: git clone ..., pip install -r requirements.txt, pip install git+https://github.com/NVIDIA/TransformerEngine.git@v1.8 --no-deps --no-build-isolation, pip install flash_attn==2.4.2 --no-build-isolation, pip3 install git+https://github.com/tgale96/grouped_gemm.git@v0.1.4 --no-build-isolation --no-deps, REAL_CUDA=1 pip install -e . --no-build-isolation.
  • GPU dependencies: NVIDIA TransformerEngine, FlashAttention, grouped_gemm.
  • Documentation: Documentation
  • Tutorial: Reproduce full RLHF with 4xLLaMA-7B in 30 minutes.

Highlighted Details

  • Achieves state-of-the-art training throughput for RLHF via parameter reallocation.
  • Supports large-scale SFT, reward modeling, DPO, PPO, and generation.
  • Integrates seamlessly with HuggingFace checkpoints and vLLM.
  • Offers flexibility with Hydra configuration and support for custom algorithms like GRPO.

Maintenance & Community

  • Development moved to AReaL.
  • WeChat group available for technical discussions.
  • Notable contributors from Tsinghua University and OpenPsi Inc.
  • References implementations from Megatron-LM and DeepSpeed.

Licensing & Compatibility

  • No license specified in the README.
  • Compatible with HuggingFace and vLLM.

Limitations & Caveats

This repository has been archived, with development moved to a new project (AReaL). The specific license for this archived version is not stated, which may impact commercial use or closed-source integration.

Health Check
Last Commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Huber Jeff Huber(Cofounder of Chroma), Omar Khattab Omar Khattab(Coauthor of DSPy, ColBERT; Professor at MIT), and
1 more.

arbor by Ziems

0%
302
Framework for optimizing DSPy programs with RL
Created 10 months ago
Updated 3 days ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
3 more.

ROLL by alibaba

2.3%
3k
RL library for large language models
Created 7 months ago
Updated 20 hours ago
Feedback? Help us improve.