R1-V  by StarsfieldAI

VLM research for reinforcing generalization with minimal cost

created 6 months ago
3,877 stars

Top 12.9% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

R1-V is an open-source framework for Reinforcement Learning from Human Feedback (RLHF) in Vision-Language Models (VLMs), aiming to enhance generalization capabilities with minimal cost. It targets researchers and developers working on visual agents and general vision-language intelligence, offering improved algorithm efficiency and task diversity.

How It Works

The project implements a Reinforcement Learning from Human Feedback (RLHF) approach, specifically GRPO (Generalized Proximal Policy Optimization), to fine-tune VLMs. This method aims to improve the model's ability to generalize across various visual reasoning tasks by learning from generated data and feedback, potentially leading to more robust and capable models.

Quick Start & Requirements

  • Install: Use Conda to create an environment (conda create -n r1-v python=3.11) and activate it, then run bash setup.sh. Ensure your environment aligns with ./src/requirements.txt.
  • Prerequisites: Python 3.11, vllm==0.7.2 (for accelerated training), deepspeed, wandb, flash_attention_2.
  • Supported Models: Qwen2-VL, Qwen2.5-VL.
  • Supported Datasets: CLEVR-70k-Counting, CLEVR-70k-Complex, GEOQA-8k.
  • Resources: Training requires significant computational resources (e.g., 8 GPUs for GRPO training). Evaluation scripts are provided.

Highlighted Details

  • Achieves 82.5% on SuperCLEVR with Qwen2VL-2B-Instruct-GRPO (100 steps).
  • Achieves 47.48% on GEOQA with Qwen2.5VL-3B-Instruct-GRPO (1 epoch).
  • Supports vLLM for accelerated training and inference.
  • Offers both GRPO and SFT (Supervised Fine-Tuning) training code.

Maintenance & Community

The project is actively maintained with recent updates in February 2025, including support for new models and bug fixes. The team welcomes community contributions and ideas, particularly for issues marked "help wanted."

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, it acknowledges contributions from various projects with different licenses (e.g., Apache 2.0 for DeepSeek, MIT for QwenVL). Users should verify licensing for commercial use.

Limitations & Caveats

A bug related to batched training was noted, with a recommendation to use per_device_train_batch_size=1 for reproduction. OOM errors can occur, suggesting a reduction in --num_generations or using vLLM. The project also notes that enforcing Chain-of-Thought reasoning might be detrimental to smaller models.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
6
Star History
280 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.