R1-V by StarsfieldAI

VLM research for reinforcing generalization with minimal cost

Created 11 months ago

4,016 stars

Top 12.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Elvis Saravia

Founder of DAIR.AI

Thomas Wolf

Cofounder of Hugging Face

Project Summary

R1-V is an open-source framework for Reinforcement Learning from Human Feedback (RLHF) in Vision-Language Models (VLMs), aiming to enhance generalization capabilities with minimal cost. It targets researchers and developers working on visual agents and general vision-language intelligence, offering improved algorithm efficiency and task diversity.

How It Works

The project implements a Reinforcement Learning from Human Feedback (RLHF) approach, specifically GRPO (Generalized Proximal Policy Optimization), to fine-tune VLMs. This method aims to improve the model's ability to generalize across various visual reasoning tasks by learning from generated data and feedback, potentially leading to more robust and capable models.

Quick Start & Requirements

Install: Use Conda to create an environment (conda create -n r1-v python=3.11) and activate it, then run bash setup.sh. Ensure your environment aligns with ./src/requirements.txt.
Prerequisites: Python 3.11, vllm==0.7.2 (for accelerated training), deepspeed, wandb, flash_attention_2.
Supported Models: Qwen2-VL, Qwen2.5-VL.
Supported Datasets: CLEVR-70k-Counting, CLEVR-70k-Complex, GEOQA-8k.
Resources: Training requires significant computational resources (e.g., 8 GPUs for GRPO training). Evaluation scripts are provided.

Highlighted Details

Achieves 82.5% on SuperCLEVR with Qwen2VL-2B-Instruct-GRPO (100 steps).
Achieves 47.48% on GEOQA with Qwen2.5VL-3B-Instruct-GRPO (1 epoch).
Supports vLLM for accelerated training and inference.
Offers both GRPO and SFT (Supervised Fine-Tuning) training code.

Maintenance & Community

The project is actively maintained with recent updates in February 2025, including support for new models and bug fixes. The team welcomes community contributions and ideas, particularly for issues marked "help wanted."

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, it acknowledges contributions from various projects with different licenses (e.g., Apache 2.0 for DeepSeek, MIT for QwenVL). Users should verify licensing for commercial use.

Limitations & Caveats

A bug related to batched training was noted, with a recommendation to use per_device_train_batch_size=1 for reproduction. OOM errors can occur, suggesting a reduction in --num_generations or using vLLM. The project also notes that enforcing Chain-of-Thought reasoning might be detrimental to smaller models.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

22 stars in the last 30 days