Awesome-LLMs-meet-Multimodal-Generation  by YingqingHe

Curated paper list on LLMs for multimodal generation/editing

created 1 year ago
494 stars

Top 63.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated survey of research papers focusing on Large Language Models (LLMs) applied to multimodal generation and editing across visual (image, video, 3D) and audio domains. It serves as a comprehensive resource for researchers and practitioners in the field of generative AI, aiming to consolidate the rapidly evolving landscape of LLM-powered multimodal creation.

How It Works

The repository organizes papers by modality and task (generation, editing, agents, understanding, safety), providing a structured overview of the field. It categorizes papers into "LLM-based" and "Non-LLM-based" approaches, highlighting the specific role LLMs play in driving multimodal outputs. The content is presented as a browsable list, allowing users to quickly navigate to areas of interest.

Quick Start & Requirements

This repository is a curated list of research papers and does not involve direct code execution or installation. All requirements are related to accessing and reading the research papers themselves.

Highlighted Details

  • Comprehensive coverage of LLM applications in image, video, 3D, and audio generation and editing.
  • Categorization into LLM-based and non-LLM-based approaches for clear comparison.
  • Includes sections on multimodal agents, understanding, and safety, providing a holistic view.
  • Regularly updated with recent publications, indicated by recent dates on many entries.

Maintenance & Community

The repository is led by Yingqing He and Zhaoyang Liu, with contributions from a team listed by modality. The project welcomes contributions via pull requests or comments.

Licensing & Compatibility

The repository itself is a collection of links to research papers and does not have a specific software license. The licensing of the individual papers would depend on their respective publication venues.

Limitations & Caveats

This repository is a survey and does not provide code implementations or direct access to models. Users must refer to the original papers for details on specific methodologies, datasets, and performance.

Health Check
Last commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.