RLHF framework for scalable training of large language models
Top 7.0% on sourcepulse
OpenRLHF is a comprehensive, high-performance framework for Reinforcement Learning from Human Feedback (RLHF), designed for ease of use and scalability. It targets researchers and engineers working with large language models, enabling efficient training and fine-tuning of models up to 70B parameters. The framework simplifies complex RLHF workflows, offering significant speedups and memory efficiency.
How It Works
OpenRLHF employs a distributed architecture powered by Ray, separating Actor, Reward, Reference, and Critic models across GPUs for scalable training. It integrates vLLM with Auto Tensor Parallelism (AutoTP) for accelerated sample generation, which constitutes the majority of RLHF training time. Memory efficiency is achieved through DeepSpeed's ZeRO-3 and AutoTP, allowing large model training without heavy frameworks. The PPO implementation incorporates advanced techniques for stability and reward quality.
Quick Start & Requirements
docker run --runtime=nvidia -it --rm --shm-size="10g" --cap-add=SYS_ADMIN -v $PWD:/openrlhf nvcr.io/nvidia/pytorch:24.07-py3 bash sudo pip uninstall xgboost transformer_engine flash_attn pynvml -y pip install openrlhf
). vLLM support requires pip install openrlhf[vllm]
or openrlhf[vllm_latest]
.Highlighted Details
Maintenance & Community
The project is actively maintained with frequent updates and contributions from various organizations including Google, ByteDance, Tencent, and Alibaba. Community engagement is encouraged via email or GitHub issues.
Licensing & Compatibility
The project is licensed under Apache 2.0, allowing for commercial use and integration with closed-source projects.
Limitations & Caveats
While highly performant, the setup can be complex, especially for distributed training. The README notes that provided data may be outdated, recommending reference to the performance tuning section for re-testing. Some advanced features like RingAttention require additional installations.
1 day ago
1 day