ReaLHF by openpsi-project

Efficient RLHF training system for LLMs using parameter reallocation

Created 1 year ago

330 stars

Top 83.1% on SourcePulse

View on GitHub

1 Expert Loves This Project

Yaowei Zheng

Author of LLaMA-Factory

Project Summary

This repository provides ReaLHF, a distributed system for efficient Reinforcement Learning from Human Feedback (RLHF) training of Large Language Models (LLMs). It targets researchers and engineers working on LLM alignment, offering significantly higher training throughput and memory efficiency through its novel parameter reallocation technique.

How It Works

ReaLHF employs a parameter reallocation strategy, dynamically redistributing LLM parameters and adapting parallelization across a cluster during training. This approach optimizes resource allocation for each workload, leading to superior PPO training throughput compared to existing systems, especially as model size and GPU count increase. It also supports advanced RLHF algorithms and features like CUDAGraphs for high-throughput generation.

Quick Start & Requirements

Install from source: git clone ..., pip install -r requirements.txt, pip install git+https://github.com/NVIDIA/TransformerEngine.git@v1.8 --no-deps --no-build-isolation, pip install flash_attn==2.4.2 --no-build-isolation, pip3 install git+https://github.com/tgale96/grouped_gemm.git@v0.1.4 --no-build-isolation --no-deps, REAL_CUDA=1 pip install -e . --no-build-isolation.
GPU dependencies: NVIDIA TransformerEngine, FlashAttention, grouped_gemm.
Documentation: Documentation
Tutorial: Reproduce full RLHF with 4xLLaMA-7B in 30 minutes.

Highlighted Details

Achieves state-of-the-art training throughput for RLHF via parameter reallocation.
Supports large-scale SFT, reward modeling, DPO, PPO, and generation.
Integrates seamlessly with HuggingFace checkpoints and vLLM.
Offers flexibility with Hydra configuration and support for custom algorithms like GRPO.

Maintenance & Community

Development moved to AReaL.
WeChat group available for technical discussions.
Notable contributors from Tsinghua University and OpenPsi Inc.
References implementations from Megatron-LM and DeepSpeed.

Licensing & Compatibility

No license specified in the README.
Compatible with HuggingFace and vLLM.

Limitations & Caveats

This repository has been archived, with development moved to a new project (AReaL). The specific license for this archived version is not stated, which may impact commercial use or closed-source integration.

Health Check

Last Commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days