Curated paper list on LLMs for multimodal generation/editing
Top 63.5% on sourcepulse
This repository is a curated survey of research papers focusing on Large Language Models (LLMs) applied to multimodal generation and editing across visual (image, video, 3D) and audio domains. It serves as a comprehensive resource for researchers and practitioners in the field of generative AI, aiming to consolidate the rapidly evolving landscape of LLM-powered multimodal creation.
How It Works
The repository organizes papers by modality and task (generation, editing, agents, understanding, safety), providing a structured overview of the field. It categorizes papers into "LLM-based" and "Non-LLM-based" approaches, highlighting the specific role LLMs play in driving multimodal outputs. The content is presented as a browsable list, allowing users to quickly navigate to areas of interest.
Quick Start & Requirements
This repository is a curated list of research papers and does not involve direct code execution or installation. All requirements are related to accessing and reading the research papers themselves.
Highlighted Details
Maintenance & Community
The repository is led by Yingqing He and Zhaoyang Liu, with contributions from a team listed by modality. The project welcomes contributions via pull requests or comments.
Licensing & Compatibility
The repository itself is a collection of links to research papers and does not have a specific software license. The licensing of the individual papers would depend on their respective publication venues.
Limitations & Caveats
This repository is a survey and does not provide code implementations or direct access to models. Users must refer to the original papers for details on specific methodologies, datasets, and performance.
4 months ago
1 day