Curated list of multimodal AI research papers
Top 99.3% on SourcePulse
This repository is a curated list of academic papers focused on multimodal studies, serving researchers and practitioners in the field of artificial intelligence. It aims to provide a comprehensive and up-to-date catalog of significant publications, facilitating discovery and knowledge sharing within the multimodal AI community.
How It Works
The repository functions as a living bibliography, meticulously organized by sub-categories within multimodal AI, such as Visual Understanding, Omni Understanding, Unified Understanding and Generation, Diffusion MLLM, Multimodal Embedding/Retrieval, and more. Each entry typically includes the paper's title, venue, publication date, and links to code repositories or project pages when available.
Quick Start & Requirements
This repository is a curated list and does not require installation or execution. Users can browse the categorized lists of papers directly.
Highlighted Details
Maintenance & Community
The repository is maintained by "friedrichor" and encourages community contributions via GitHub issues for new paper submissions.
Licensing & Compatibility
The repository itself, as a collection of links and metadata, does not impose specific licensing restrictions beyond those of the linked academic papers and their associated code. It is compatible with general research and academic use.
Limitations & Caveats
The list is a curated collection and may not be exhaustive. The availability and quality of linked code or project pages depend on the original authors. The primary value is in the organization and discovery of research papers.
3 weeks ago
1 day