EasyR1  by hiyouga

RL training framework for multi-modality models

Created 6 months ago
3,603 stars

Top 13.5% on SourcePulse

GitHubView on GitHub
Project Summary

EasyR1 is an efficient and scalable reinforcement learning training framework designed for multi-modality vision-language models (VLMs). It targets researchers and engineers working with VLMs, offering a high-performance solution for tasks like fine-tuning and policy optimization, building upon the veRL project.

How It Works

EasyR1 leverages a HybridEngine design and vLLM's SPMD mode for efficiency and scalability. It supports various RL algorithms such as GRPO, Reinforce++, ReMax, and RLOO, and can process diverse text, vision-text, and multi-image-text datasets. Key features include padding-free training and robust logging integration with multiple platforms.

Quick Start & Requirements

  • Install: pip install -e . within the cloned repository. A Dockerfile is provided for environment setup.
  • Prerequisites: Python 3.9+, transformers>=4.51.0, flash-attn>=2.4.3, vllm>=0.8.3. CUDA 12.6 and cuDNN are recommended via the provided Docker image.
  • Hardware: VRAM requirements vary by model size and precision (e.g., 2x 24GB for 1.5B models with AMP, 8x 80GB for 32B models with BF16).
  • Links: Tutorial: https://github.com/hiyouga/EasyR1/blob/main/examples/qwen2_5_vl_7b_geo3k_grpo.sh

Highlighted Details

  • Supports Llama3, Qwen2/2.5, and DeepSeek-R1 models.
  • Enables BF16 training for reduced VRAM usage.
  • Reproduces baselines from the R1-V project (e.g., CLEVR-70k-Counting, GeoQA-8k).
  • Integrated with Wandb, SwanLab, Mlflow, and Tensorboard for experiment tracking.

Maintenance & Community

The project is a fork of veRL and cites its core contributors. A WeChat group is available for discussion.

Licensing & Compatibility

The project does not explicitly state a license in the README. It is a fork of veRL, which is Apache 2.0 licensed. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Vision language models are not yet compatible with ulysses parallelism. Support for LoRA is planned but not yet implemented. The project focuses solely on RL training and does not provide scripts for supervised fine-tuning or inference.

Health Check
Last Commit

14 hours ago

Responsiveness

1 day

Pull Requests (30d)
10
Issues (30d)
23
Star History
242 stars in the last 30 days

Explore Similar Projects

Starred by Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
4 more.

simpleRL-reason by hkust-nlp

0.1%
4k
RL recipe for reasoning ability in models
Created 7 months ago
Updated 1 month ago
Feedback? Help us improve.