lmm-r1  by TideDra

RL framework for multimodal reasoning in 3B LMMs

created 5 months ago
798 stars

Top 45.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides LMM-R1, a framework for enhancing the reasoning abilities of smaller (3B parameter) Large Multimodal Models (LMMs). It addresses the limitations of small models in complex reasoning tasks and the scarcity of high-quality multimodal reasoning data by employing a two-stage, rule-based Reinforcement Learning (RL) approach. The target audience includes researchers and developers working with LMMs who need to improve their reasoning capabilities, particularly in multimodal contexts.

How It Works

LMM-R1 utilizes a two-stage RL framework: Foundational Reasoning Enhancement (FRE) and Multimodal Generalization Training (MGT). FRE leverages text-only data to build a strong reasoning foundation, while MGT extends these capabilities to multimodal inputs. This staged approach is designed to overcome data limitations and efficiently improve performance on diverse reasoning tasks, offering a more robust and scalable method for LMM reasoning enhancement compared to direct training.

Quick Start & Requirements

  • Installation:
    git clone https://github.com/TideDra/lmm-r1.git
    cd lmm-r1
    pip install -e .[vllm]
    pip install flash_attn --no-build-isolation
    
  • Prerequisites: vLLM 0.7.2 or higher is recommended. Dockerfiles are provided.
  • Datasets: Requires multimodal prompt datasets in OpenAI-compatible message format (JSON).
  • Links: OpenRLHF-M, Paper, Demo

Highlighted Details

  • Supports PPO/REINFORCE++/RLOO training for LMMs, achieving a 4.7x speedup with RLOO over R1-V (GRPO).
  • Compatible with LMMs like Qwen2.5-VL, Phi3.5-V, and Phi4-Multimodal.
  • Integrates vLLM for accelerated generation and supports distributed training via Ray.
  • Offers QLoRA and LoRA fine-tuning options, along with FlashAttention2 integration.

Maintenance & Community

The codebase has been merged into OpenRLHF-M, the official multimodal RL infrastructure from OpenRLHF. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository is released under the Apache 2.0 license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is presented as a reproduction of DeepSeek-R1 and has been merged into OpenRLHF-M. While it supports various LMMs, the primary focus is on enhancing reasoning for smaller 3B models.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
63 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.