OpenRLHF  by OpenRLHF

RLHF framework for scalable training of large language models

created 2 years ago
7,543 stars

Top 7.0% on sourcepulse

GitHubView on GitHub
Project Summary

OpenRLHF is a comprehensive, high-performance framework for Reinforcement Learning from Human Feedback (RLHF), designed for ease of use and scalability. It targets researchers and engineers working with large language models, enabling efficient training and fine-tuning of models up to 70B parameters. The framework simplifies complex RLHF workflows, offering significant speedups and memory efficiency.

How It Works

OpenRLHF employs a distributed architecture powered by Ray, separating Actor, Reward, Reference, and Critic models across GPUs for scalable training. It integrates vLLM with Auto Tensor Parallelism (AutoTP) for accelerated sample generation, which constitutes the majority of RLHF training time. Memory efficiency is achieved through DeepSpeed's ZeRO-3 and AutoTP, allowing large model training without heavy frameworks. The PPO implementation incorporates advanced techniques for stability and reward quality.

Quick Start & Requirements

  • Installation: Recommended via Docker (docker run --runtime=nvidia -it --rm --shm-size="10g" --cap-add=SYS_ADMIN -v $PWD:/openrlhf nvcr.io/nvidia/pytorch:24.07-py3 bash sudo pip uninstall xgboost transformer_engine flash_attn pynvml -y pip install openrlhf). vLLM support requires pip install openrlhf[vllm] or openrlhf[vllm_latest].
  • Prerequisites: NVIDIA GPU, CUDA, Python. vLLM 0.8.3 or higher is recommended.
  • Resources: Requires significant GPU resources for training large models.
  • Docs: Documents

Highlighted Details

  • Supports multiple RLHF algorithms: PPO, REINFORCE++, GRPO, DPO, IPO, KTO.
  • Achieves high throughput via vLLM integration and optimized sample generation.
  • Enables training of models >70B parameters using ZeRO-3 and AutoTP.
  • Features include LoRA/QLoRA support, FlashAttention2, RingAttention, and Mixture of Experts (MoE).
  • Offers performance benchmarks showing significant speedups over DSChat.

Maintenance & Community

The project is actively maintained with frequent updates and contributions from various organizations including Google, ByteDance, Tencent, and Alibaba. Community engagement is encouraged via email or GitHub issues.

Licensing & Compatibility

The project is licensed under Apache 2.0, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

While highly performant, the setup can be complex, especially for distributed training. The README notes that provided data may be outdated, recommending reference to the performance tuning section for re-testing. Some advanced features like RingAttention require additional installations.

Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
9
Issues (30d)
17
Star History
1,071 stars in the last 90 days

Explore Similar Projects

Starred by Lewis Tunstall Lewis Tunstall(Researcher at Hugging Face), Robert Nishihara Robert Nishihara(Cofounder of Anyscale; Author of Ray), and
4 more.

verl by volcengine

2.4%
12k
RL training library for LLMs
created 9 months ago
updated 14 hours ago
Feedback? Help us improve.