Awesome-LLMs-meet-Multimodal-Generation by YingqingHe

Curated paper list on LLMs for multimodal generation/editing

Created 2 years ago

537 stars

Top 59.1% on SourcePulse

Project Summary

This repository is a curated survey of research papers focusing on Large Language Models (LLMs) applied to multimodal generation and editing across visual (image, video, 3D) and audio domains. It serves as a comprehensive resource for researchers and practitioners in the field of generative AI, aiming to consolidate the rapidly evolving landscape of LLM-powered multimodal creation.

How It Works

The repository organizes papers by modality and task (generation, editing, agents, understanding, safety), providing a structured overview of the field. It categorizes papers into "LLM-based" and "Non-LLM-based" approaches, highlighting the specific role LLMs play in driving multimodal outputs. The content is presented as a browsable list, allowing users to quickly navigate to areas of interest.

Quick Start & Requirements

This repository is a curated list of research papers and does not involve direct code execution or installation. All requirements are related to accessing and reading the research papers themselves.

Highlighted Details

Comprehensive coverage of LLM applications in image, video, 3D, and audio generation and editing.
Categorization into LLM-based and non-LLM-based approaches for clear comparison.
Includes sections on multimodal agents, understanding, and safety, providing a holistic view.
Regularly updated with recent publications, indicated by recent dates on many entries.

Maintenance & Community

The repository is led by Yingqing He and Zhaoyang Liu, with contributions from a team listed by modality. The project welcomes contributions via pull requests or comments.

Licensing & Compatibility

The repository itself is a collection of links to research papers and does not have a specific software license. The licensing of the individual papers would depend on their respective publication venues.

Limitations & Caveats

This repository is a survey and does not provide code implementations or direct access to models. Users must refer to the original papers for details on specific methodologies, datasets, and performance.

Awesome-LLMs-meet-Multimodal-Generation by YingqingHe

Explore Similar Projects

DreamLLM by RunpeiDong

Awesome-Multimodal-LLM by HenryHZY

LLMGA by JIA-Lab-research

Awesome-Multimodal-Large-Language-Models by yfzhang114

LLM-in-Vision by DirtyHarryLYL

MILS by facebookresearch

Awesome-Unified-Multimodal-Models by showlab

Multimodal-AND-Large-Language-Models by Yangyi-Chen

GenAI_LLM_timeline by hollobit

Building-LLM-Powered-Applications by PacktPublishing

NExT-GPT by NExT-GPT

lollms-webui by ParisNeo