Paper list for unified multimodal models
Top 52.7% on sourcepulse
This repository curates papers, code, and resources for unified multimodal models, often termed "Any-to-Any" generation. It targets researchers and developers working on integrating multimodal understanding and generation tasks into single frameworks, offering a centralized hub for advancements in this rapidly evolving field.
How It Works
Unified multimodal models aim to bridge the gap between traditional separate models for multimodal understanding and generation. They operate on a principle of processing and generating content across various modalities (text, image, audio, video, etc.) within a single, cohesive framework, enabling seamless interaction and creation across different data types.
Quick Start & Requirements
This repository is a curated list of research papers and associated code. There is no direct installation or execution command. Requirements are dependent on the individual projects linked within the list.
Highlighted Details
Maintenance & Community
This project is ongoing and welcomes pull requests for suggestions, new papers, or corrections. Contributions can be made by editing and submitting a pull request, or by opening an issue. Users are encouraged to star the repository if they find it useful.
Licensing & Compatibility
The repository itself is not software and does not have a license. The licensing and compatibility of individual models and codebases listed within the repository will vary and must be checked on a per-project basis.
Limitations & Caveats
This is a curated list of research papers and not a runnable software project. The "code" mentioned refers to external repositories, which may have their own dependencies, licenses, and maintenance statuses. The list is actively growing, and some entries may represent very recent or experimental work.
2 days ago
Inactive