Efficient RLHF training system for LLMs using parameter reallocation
Top 88.8% on sourcepulse
This repository provides ReaLHF, a distributed system for efficient Reinforcement Learning from Human Feedback (RLHF) training of Large Language Models (LLMs). It targets researchers and engineers working on LLM alignment, offering significantly higher training throughput and memory efficiency through its novel parameter reallocation technique.
How It Works
ReaLHF employs a parameter reallocation strategy, dynamically redistributing LLM parameters and adapting parallelization across a cluster during training. This approach optimizes resource allocation for each workload, leading to superior PPO training throughput compared to existing systems, especially as model size and GPU count increase. It also supports advanced RLHF algorithms and features like CUDAGraphs for high-throughput generation.
Quick Start & Requirements
git clone ...
, pip install -r requirements.txt
, pip install git+https://github.com/NVIDIA/TransformerEngine.git@v1.8 --no-deps --no-build-isolation
, pip install flash_attn==2.4.2 --no-build-isolation
, pip3 install git+https://github.com/tgale96/grouped_gemm.git@v0.1.4 --no-build-isolation --no-deps
, REAL_CUDA=1 pip install -e . --no-build-isolation
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
This repository has been archived, with development moved to a new project (AReaL). The specific license for this archived version is not stated, which may impact commercial use or closed-source integration.
3 months ago
1 day