Awesome-Unified-Multimodal by Purshow

Curated unified multimodal models and research

Created 1 year ago

342 stars

Top 80.8% on SourcePulse

Project Summary

This repository serves as a curated collection of papers, code, and resources focused on unified multimodal models, aiming to advance the integration of vision and language understanding and generation. It targets researchers and practitioners in the AI field, providing a centralized hub for the latest advancements in this rapidly evolving area.

How It Works

The repository organizes research papers chronologically, highlighting key contributions in unifying multimodal understanding and generation. It focuses on models that leverage large language models (LLMs) and incorporate visual data through various tokenization and fusion strategies, often employing autoregressive or diffusion-based generation techniques. This approach allows for a comprehensive overview of the state-of-the-art in creating single models capable of handling diverse multimodal tasks.

Quick Start & Requirements

This repository is a curated list of research papers and does not have direct installation or execution commands. Users are expected to follow links provided within the paper entries to access code repositories and specific requirements.

Highlighted Details

Extensive coverage of papers from 2023 to mid-2025, showcasing rapid progress.
Highlights "highly recommended" papers, indicating significant contributions or novel approaches.
Includes links to related curated lists and benchmarking resources for further exploration.
Features papers focusing on diverse techniques like discrete visual tokenization, early fusion, and diffusion models.

Maintenance & Community

The repository is maintained by Purshow/Awesome-Unified-Multimodal, with contributions welcomed via pull requests or direct email. It encourages collaboration and discussion on new papers and research ideas.

Licensing & Compatibility

The repository itself is a collection of links and information; licensing details would pertain to the individual projects linked within. Users should verify the licenses of any code or models they choose to use.

Limitations & Caveats

This is a curated list and does not provide direct access to models or code. Users must independently locate and evaluate the linked resources. The rapid pace of research means the list is constantly evolving, and some linked projects may be experimental or in early development stages.

Awesome-Unified-Multimodal by Purshow

Explore Similar Projects

Awesome-Multimodality by Yutong-Zhou-cv

Awesome-Multimodal-Papers by friedrichor

Awesome-Unified-Multimodal-Models by AIDC-AI

LLM-in-Vision by DirtyHarryLYL

Multimodal-AND-Large-Language-Models by Yangyi-Chen

Awesome_Matching_Pretraining_Transfering by Paranioar

Awesome-Foundation-Models by uncbiag

VisionLLM by OpenGVLab

awesome-ai-papers by aimerou

Awesome-Multimodal-Research by Eurus-Holmes

DeepSeek-VL by deepseek-ai

awesome-multimodal-ml by pliang279