Awesome-Unified-Multimodal-Models by showlab

Paper list for unified multimodal models

Created 1 year ago

779 stars

Top 45.1% on SourcePulse

Project Summary

This repository curates papers, code, and resources for unified multimodal models, often termed "Any-to-Any" generation. It targets researchers and developers working on integrating multimodal understanding and generation tasks into single frameworks, offering a centralized hub for advancements in this rapidly evolving field.

How It Works

Unified multimodal models aim to bridge the gap between traditional separate models for multimodal understanding and generation. They operate on a principle of processing and generating content across various modalities (text, image, audio, video, etc.) within a single, cohesive framework, enabling seamless interaction and creation across different data types.

Quick Start & Requirements

This repository is a curated list of research papers and associated code. There is no direct installation or execution command. Requirements are dependent on the individual projects linked within the list.

Highlighted Details

Comprehensive listing of recent (late 2023 - early 2025) unified multimodal models.
Covers a wide range of modalities including vision, language, audio, video, and motion.
Includes models focusing on various architectural approaches like diffusion, autoregression, and state space models.
Provides links to arXiv preprints and some conference publications.

Maintenance & Community

This project is ongoing and welcomes pull requests for suggestions, new papers, or corrections. Contributions can be made by editing and submitting a pull request, or by opening an issue. Users are encouraged to star the repository if they find it useful.

Licensing & Compatibility

The repository itself is not software and does not have a license. The licensing and compatibility of individual models and codebases listed within the repository will vary and must be checked on a per-project basis.

Limitations & Caveats

This is a curated list of research papers and not a runnable software project. The "code" mentioned refers to external repositories, which may have their own dependencies, licenses, and maintenance statuses. The list is actively growing, and some entries may represent very recent or experimental work.

Awesome-Unified-Multimodal-Models by showlab

Explore Similar Projects

Awesome-Multimodal-LLM by HenryHZY

SEED-X by AILab-CVC

Liquid by FoundationVision

Awesome_Matching_Pretraining_Transfering by Paranioar

PandaGPT by yxuansu

mPLUG-Owl by X-PLUG

NExT-GPT by NExT-GPT

Bagel by ByteDance-Seed

align-anything by PKU-Alignment

DeepSeek-VL by deepseek-ai

awesome-multimodal-ml by pliang279

Janus by deepseek-ai