OpenRLHF by OpenRLHF

RLHF framework for scalable training of large language models

Created 2 years ago

8,761 stars

Top 5.9% on SourcePulse

View on GitHub

14 Experts Love This Project

Beyang Liu

Cofounder of Sourcegraph

Chaoyu Yang

Founder of Bento

Vincent Weisser

Cofounder of Prime Intellect

Jason Knight

Director AI Compilers at NVIDIA; Cofounder of OctoML

and 10 more!

Project Summary

OpenRLHF is a comprehensive, high-performance framework for Reinforcement Learning from Human Feedback (RLHF), designed for ease of use and scalability. It targets researchers and engineers working with large language models, enabling efficient training and fine-tuning of models up to 70B parameters. The framework simplifies complex RLHF workflows, offering significant speedups and memory efficiency.

How It Works

OpenRLHF employs a distributed architecture powered by Ray, separating Actor, Reward, Reference, and Critic models across GPUs for scalable training. It integrates vLLM with Auto Tensor Parallelism (AutoTP) for accelerated sample generation, which constitutes the majority of RLHF training time. Memory efficiency is achieved through DeepSpeed's ZeRO-3 and AutoTP, allowing large model training without heavy frameworks. The PPO implementation incorporates advanced techniques for stability and reward quality.

Quick Start & Requirements

Installation: Recommended via Docker (docker run --runtime=nvidia -it --rm --shm-size="10g" --cap-add=SYS_ADMIN -v $PWD:/openrlhf nvcr.io/nvidia/pytorch:24.07-py3 bash sudo pip uninstall xgboost transformer_engine flash_attn pynvml -y pip install openrlhf). vLLM support requires pip install openrlhf[vllm] or openrlhf[vllm_latest].
Prerequisites: NVIDIA GPU, CUDA, Python. vLLM 0.8.3 or higher is recommended.
Resources: Requires significant GPU resources for training large models.
Docs: Documents

Highlighted Details

Supports multiple RLHF algorithms: PPO, REINFORCE++, GRPO, DPO, IPO, KTO.
Achieves high throughput via vLLM integration and optimized sample generation.
Enables training of models >70B parameters using ZeRO-3 and AutoTP.
Features include LoRA/QLoRA support, FlashAttention2, RingAttention, and Mixture of Experts (MoE).
Offers performance benchmarks showing significant speedups over DSChat.

Maintenance & Community

The project is actively maintained with frequent updates and contributions from various organizations including Google, ByteDance, Tencent, and Alibaba. Community engagement is encouraged via email or GitHub issues.

Licensing & Compatibility

The project is licensed under Apache 2.0, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

While highly performant, the setup can be complex, especially for distributed training. The README notes that provided data may be outdated, recommending reference to the performance tuning section for re-testing. Some advanced features like RingAttention require additional installations.

Health Check

Last Commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

194 stars in the last 30 days