lmm-r1 by TideDra

RL framework for multimodal reasoning in 3B LMMs

Created 11 months ago

835 stars

Top 42.6% on SourcePulse

Project Summary

This repository provides LMM-R1, a framework for enhancing the reasoning abilities of smaller (3B parameter) Large Multimodal Models (LMMs). It addresses the limitations of small models in complex reasoning tasks and the scarcity of high-quality multimodal reasoning data by employing a two-stage, rule-based Reinforcement Learning (RL) approach. The target audience includes researchers and developers working with LMMs who need to improve their reasoning capabilities, particularly in multimodal contexts.

How It Works

LMM-R1 utilizes a two-stage RL framework: Foundational Reasoning Enhancement (FRE) and Multimodal Generalization Training (MGT). FRE leverages text-only data to build a strong reasoning foundation, while MGT extends these capabilities to multimodal inputs. This staged approach is designed to overcome data limitations and efficiently improve performance on diverse reasoning tasks, offering a more robust and scalable method for LMM reasoning enhancement compared to direct training.

Quick Start & Requirements

Installation:

git clone https://github.com/TideDra/lmm-r1.git
cd lmm-r1
pip install -e .[vllm]
pip install flash_attn --no-build-isolation

Prerequisites: vLLM 0.7.2 or higher is recommended. Dockerfiles are provided.
Datasets: Requires multimodal prompt datasets in OpenAI-compatible message format (JSON).
Links: OpenRLHF-M, Paper, Demo

Highlighted Details

Supports PPO/REINFORCE++/RLOO training for LMMs, achieving a 4.7x speedup with RLOO over R1-V (GRPO).
Compatible with LMMs like Qwen2.5-VL, Phi3.5-V, and Phi4-Multimodal.
Integrates vLLM for accelerated generation and supports distributed training via Ray.
Offers QLoRA and LoRA fine-tuning options, along with FlashAttention2 integration.

Maintenance & Community

The codebase has been merged into OpenRLHF-M, the official multimodal RL infrastructure from OpenRLHF. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The repository is released under the Apache 2.0 license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The project is presented as a reproduction of DeepSeek-R1 and has been merged into OpenRLHF-M. While it supports various LMMs, the primary focus is on enhancing reasoning for smaller 3B models.

Health Check

Last Commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days