EasyR1  by hiyouga

RL training framework for multi-modality models

created 5 months ago
3,194 stars

Top 15.4% on sourcepulse

GitHubView on GitHub
Project Summary

EasyR1 is an efficient and scalable reinforcement learning training framework designed for multi-modality vision-language models (VLMs). It targets researchers and engineers working with VLMs, offering a high-performance solution for tasks like fine-tuning and policy optimization, building upon the veRL project.

How It Works

EasyR1 leverages a HybridEngine design and vLLM's SPMD mode for efficiency and scalability. It supports various RL algorithms such as GRPO, Reinforce++, ReMax, and RLOO, and can process diverse text, vision-text, and multi-image-text datasets. Key features include padding-free training and robust logging integration with multiple platforms.

Quick Start & Requirements

  • Install: pip install -e . within the cloned repository. A Dockerfile is provided for environment setup.
  • Prerequisites: Python 3.9+, transformers>=4.51.0, flash-attn>=2.4.3, vllm>=0.8.3. CUDA 12.6 and cuDNN are recommended via the provided Docker image.
  • Hardware: VRAM requirements vary by model size and precision (e.g., 2x 24GB for 1.5B models with AMP, 8x 80GB for 32B models with BF16).
  • Links: Tutorial: https://github.com/hiyouga/EasyR1/blob/main/examples/qwen2_5_vl_7b_geo3k_grpo.sh

Highlighted Details

  • Supports Llama3, Qwen2/2.5, and DeepSeek-R1 models.
  • Enables BF16 training for reduced VRAM usage.
  • Reproduces baselines from the R1-V project (e.g., CLEVR-70k-Counting, GeoQA-8k).
  • Integrated with Wandb, SwanLab, Mlflow, and Tensorboard for experiment tracking.

Maintenance & Community

The project is a fork of veRL and cites its core contributors. A WeChat group is available for discussion.

Licensing & Compatibility

The project does not explicitly state a license in the README. It is a fork of veRL, which is Apache 2.0 licensed. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Vision language models are not yet compatible with ulysses parallelism. Support for LoRA is planned but not yet implemented. The project focuses solely on RL training and does not provide scripts for supervised fine-tuning or inference.

Health Check
Last commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)
19
Issues (30d)
26
Star History
989 stars in the last 90 days

Explore Similar Projects

Starred by Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), Daniel Han Daniel Han(Cofounder of Unsloth), and
4 more.

open-instruct by allenai

0.2%
3k
Training codebase for instruction-following language models
created 2 years ago
updated 14 hours ago
Starred by Lewis Tunstall Lewis Tunstall(Researcher at Hugging Face), Robert Nishihara Robert Nishihara(Cofounder of Anyscale; Author of Ray), and
4 more.

verl by volcengine

2.4%
12k
RL training library for LLMs
created 9 months ago
updated 14 hours ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
10 more.

JARVIS by microsoft

0.1%
24k
System for LLM-orchestrated AI task automation
created 2 years ago
updated 4 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.