Awesome-Large-Multimodal-Reasoning-Models by HITsz-TMG

Survey paper on large multimodal reasoning models

Created 8 months ago

571 stars

Top 56.4% on SourcePulse

Project Summary

This repository provides a comprehensive survey of Large Multimodal Reasoning Models (LMRMs), detailing their evolution from modular systems to sophisticated language-centric frameworks. It targets researchers and practitioners in AI, offering a structured overview of LMRMs' capabilities, datasets, benchmarks, and future directions, particularly towards native multimodal reasoning.

How It Works

The survey categorizes LMRMs into three stages: perception-driven reasoning (modular networks, vision-language models), language-centric short reasoning (prompt-based, structural, externally augmented), and language-centric long reasoning (cross-modal, MM-O1, MM-R1). It emphasizes the progression towards "native" LMRMs capable of agentic, omni-modal understanding and generative reasoning.

Quick Start & Requirements

This is a survey repository, not a runnable codebase. It links to numerous research papers and datasets.

Highlighted Details

Comprehensive roadmap of LMRM development from 2016 to present.
Detailed tables of models, architectures, tasks, and datasets across different reasoning stages.
Discussion on future prospects, including agentic and omni-modal reasoning models.
Extensive categorization of multimodal datasets and benchmarks for understanding, generation, reasoning, and planning.

Maintenance & Community

The repository is maintained by the HITsz-TMG group, with regular updates based on community contributions via issues or email. Contact information for contributors is provided.

Licensing & Compatibility

The repository itself is likely under a permissive license (e.g., MIT, Apache 2.0, common for GitHub projects), but it primarily serves as a curated list of research papers, each with its own licensing.

Limitations & Caveats

As a survey, it does not provide executable code. The rapid pace of LMRM development means some information may become dated quickly, though the repository aims for continuous updates.

Health Check

Last Commit

2 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

14 stars in the last 30 days