verl is a production-ready reinforcement learning (RL) training library for large language models (LLMs), designed for flexibility, efficiency, and scalability. It enables researchers and engineers to easily implement and train LLMs using various RL algorithms, integrating seamlessly with existing LLM infrastructure and supporting diverse hardware configurations.
How It Works
verl utilizes a hybrid-controller programming model that allows for flexible representation and efficient execution of complex post-training dataflows, simplifying the implementation of RL algorithms like GRPO and PPO. It decouples computation and data dependencies, facilitating integration with popular LLM frameworks such as FSDP, Megatron-LM, and vLLM. The library also supports flexible device mapping for efficient resource utilization across GPU clusters.
Quick Start & Requirements
- Installation:
pip install verl
- Prerequisites: Python 3.8+, PyTorch 2.0+, Hugging Face Transformers, vLLM (>= v0.8.2 recommended), SGLang. GPU with CUDA support is highly recommended for efficient training.
- Documentation: Quickstart, Programming Guide
Highlighted Details
- Supports state-of-the-art throughput via integrations with LLM training and inference engines.
- Features efficient actor model resharding with 3D-HybridEngine to reduce memory redundancy and communication overhead.
- Compatible with Hugging Face Transformers and Modelscope Hub, supporting models like Qwen-2.5, Llama3.1, and Gemma2.
- Offers a wide range of RL algorithms including PPO, GRPO, ReMax, DAPO, and more, with support for vision-language models (VLMs).
- Scales up to 70B models and hundreds of GPUs, with experiment tracking via wandb, swanlab, mlflow, and tensorboard.
Maintenance & Community
- Initiated by ByteDance Seed team and maintained by the verl community.
- Active development with regular releases and presentations at major conferences (e.g., EuroSys, NeurIPS).
- Community channels available via Twitter (@verl_project).
- Roadmap available at GitHub Issues.
Licensing & Compatibility
- Apache 2.0 License.
- Permissive license suitable for commercial use and integration with closed-source projects.
Limitations & Caveats
- Megatron-LM backend support for AMD GPUs is noted as "coming soon."
- Users are advised to avoid vLLM version 0.7.x due to known bugs.
- The project is actively developed, and breaking changes may occur, with a discussion thread available for tracking them.