Discover and explore top open-source AI tools and projects—updated daily.
YingqingHeCurated paper list on LLMs for multimodal generation/editing
Top 59.0% on SourcePulse
This repository is a curated survey of research papers focusing on Large Language Models (LLMs) applied to multimodal generation and editing across visual (image, video, 3D) and audio domains. It serves as a comprehensive resource for researchers and practitioners in the field of generative AI, aiming to consolidate the rapidly evolving landscape of LLM-powered multimodal creation.
How It Works
The repository organizes papers by modality and task (generation, editing, agents, understanding, safety), providing a structured overview of the field. It categorizes papers into "LLM-based" and "Non-LLM-based" approaches, highlighting the specific role LLMs play in driving multimodal outputs. The content is presented as a browsable list, allowing users to quickly navigate to areas of interest.
Quick Start & Requirements
This repository is a curated list of research papers and does not involve direct code execution or installation. All requirements are related to accessing and reading the research papers themselves.
Highlighted Details
Maintenance & Community
The repository is led by Yingqing He and Zhaoyang Liu, with contributions from a team listed by modality. The project welcomes contributions via pull requests or comments.
Licensing & Compatibility
The repository itself is a collection of links to research papers and does not have a specific software license. The licensing of the individual papers would depend on their respective publication venues.
Limitations & Caveats
This repository is a survey and does not provide code implementations or direct access to models. Users must refer to the original papers for details on specific methodologies, datasets, and performance.
10 months ago
Inactive
NExT-GPT