Awesome-Unified-Multimodal-Models  by showlab

Paper list for unified multimodal models

Created 1 year ago
688 stars

Top 49.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository curates papers, code, and resources for unified multimodal models, often termed "Any-to-Any" generation. It targets researchers and developers working on integrating multimodal understanding and generation tasks into single frameworks, offering a centralized hub for advancements in this rapidly evolving field.

How It Works

Unified multimodal models aim to bridge the gap between traditional separate models for multimodal understanding and generation. They operate on a principle of processing and generating content across various modalities (text, image, audio, video, etc.) within a single, cohesive framework, enabling seamless interaction and creation across different data types.

Quick Start & Requirements

This repository is a curated list of research papers and associated code. There is no direct installation or execution command. Requirements are dependent on the individual projects linked within the list.

Highlighted Details

  • Comprehensive listing of recent (late 2023 - early 2025) unified multimodal models.
  • Covers a wide range of modalities including vision, language, audio, video, and motion.
  • Includes models focusing on various architectural approaches like diffusion, autoregression, and state space models.
  • Provides links to arXiv preprints and some conference publications.

Maintenance & Community

This project is ongoing and welcomes pull requests for suggestions, new papers, or corrections. Contributions can be made by editing and submitting a pull request, or by opening an issue. Users are encouraged to star the repository if they find it useful.

Licensing & Compatibility

The repository itself is not software and does not have a license. The licensing and compatibility of individual models and codebases listed within the repository will vary and must be checked on a per-project basis.

Limitations & Caveats

This is a curated list of research papers and not a runnable software project. The "code" mentioned refers to external repositories, which may have their own dependencies, licenses, and maintenance statuses. The list is actively growing, and some entries may represent very recent or experimental work.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
1
Star History
21 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

NExT-GPT by NExT-GPT

0.1%
4k
Any-to-any multimodal LLM research paper
Created 2 years ago
Updated 4 months ago
Feedback? Help us improve.