EasyR1 by hiyouga

RL training framework for multi-modality models

Created 10 months ago

4,406 stars

Top 11.0% on SourcePulse

View on GitHub

2 Experts Love This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Alex Chen

Cofounder of Nexa AI

Project Summary

EasyR1 is an efficient and scalable reinforcement learning training framework designed for multi-modality vision-language models (VLMs). It targets researchers and engineers working with VLMs, offering a high-performance solution for tasks like fine-tuning and policy optimization, building upon the veRL project.

How It Works

EasyR1 leverages a HybridEngine design and vLLM's SPMD mode for efficiency and scalability. It supports various RL algorithms such as GRPO, Reinforce++, ReMax, and RLOO, and can process diverse text, vision-text, and multi-image-text datasets. Key features include padding-free training and robust logging integration with multiple platforms.

Quick Start & Requirements

Install: pip install -e . within the cloned repository. A Dockerfile is provided for environment setup.
Prerequisites: Python 3.9+, transformers>=4.51.0, flash-attn>=2.4.3, vllm>=0.8.3. CUDA 12.6 and cuDNN are recommended via the provided Docker image.
Hardware: VRAM requirements vary by model size and precision (e.g., 2x 24GB for 1.5B models with AMP, 8x 80GB for 32B models with BF16).
Links: Tutorial: https://github.com/hiyouga/EasyR1/blob/main/examples/qwen2_5_vl_7b_geo3k_grpo.sh

Highlighted Details

Supports Llama3, Qwen2/2.5, and DeepSeek-R1 models.
Enables BF16 training for reduced VRAM usage.
Reproduces baselines from the R1-V project (e.g., CLEVR-70k-Counting, GeoQA-8k).
Integrated with Wandb, SwanLab, Mlflow, and Tensorboard for experiment tracking.

Maintenance & Community

The project is a fork of veRL and cites its core contributors. A WeChat group is available for discussion.

Licensing & Compatibility

The project does not explicitly state a license in the README. It is a fork of veRL, which is Apache 2.0 licensed. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Vision language models are not yet compatible with ulysses parallelism. Support for LoRA is planned but not yet implemented. The project focuses solely on RL training and does not provide scripts for supervised fine-tuning or inference.

Health Check

Last Commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

182 stars in the last 30 days