Awesome-Large-Multimodal-Reasoning-Models  by HITsz-TMG

Survey paper on large multimodal reasoning models

created 3 months ago
444 stars

Top 68.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive survey of Large Multimodal Reasoning Models (LMRMs), detailing their evolution from modular systems to sophisticated language-centric frameworks. It targets researchers and practitioners in AI, offering a structured overview of LMRMs' capabilities, datasets, benchmarks, and future directions, particularly towards native multimodal reasoning.

How It Works

The survey categorizes LMRMs into three stages: perception-driven reasoning (modular networks, vision-language models), language-centric short reasoning (prompt-based, structural, externally augmented), and language-centric long reasoning (cross-modal, MM-O1, MM-R1). It emphasizes the progression towards "native" LMRMs capable of agentic, omni-modal understanding and generative reasoning.

Quick Start & Requirements

This is a survey repository, not a runnable codebase. It links to numerous research papers and datasets.

Highlighted Details

  • Comprehensive roadmap of LMRM development from 2016 to present.
  • Detailed tables of models, architectures, tasks, and datasets across different reasoning stages.
  • Discussion on future prospects, including agentic and omni-modal reasoning models.
  • Extensive categorization of multimodal datasets and benchmarks for understanding, generation, reasoning, and planning.

Maintenance & Community

The repository is maintained by the HITsz-TMG group, with regular updates based on community contributions via issues or email. Contact information for contributors is provided.

Licensing & Compatibility

The repository itself is likely under a permissive license (e.g., MIT, Apache 2.0, common for GitHub projects), but it primarily serves as a curated list of research papers, each with its own licensing.

Limitations & Caveats

As a survey, it does not provide executable code. The rapid pace of LMRM development means some information may become dated quickly, though the repository aims for continuous updates.

Health Check
Last commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
446 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.