Awesome-LLMs-meet-Multimodal-Generation  by YingqingHe

Curated paper list on LLMs for multimodal generation/editing

Created 1 year ago
509 stars

Top 61.4% on SourcePulse

GitHubView on GitHub
Project Summary

This repository is a curated survey of research papers focusing on Large Language Models (LLMs) applied to multimodal generation and editing across visual (image, video, 3D) and audio domains. It serves as a comprehensive resource for researchers and practitioners in the field of generative AI, aiming to consolidate the rapidly evolving landscape of LLM-powered multimodal creation.

How It Works

The repository organizes papers by modality and task (generation, editing, agents, understanding, safety), providing a structured overview of the field. It categorizes papers into "LLM-based" and "Non-LLM-based" approaches, highlighting the specific role LLMs play in driving multimodal outputs. The content is presented as a browsable list, allowing users to quickly navigate to areas of interest.

Quick Start & Requirements

This repository is a curated list of research papers and does not involve direct code execution or installation. All requirements are related to accessing and reading the research papers themselves.

Highlighted Details

  • Comprehensive coverage of LLM applications in image, video, 3D, and audio generation and editing.
  • Categorization into LLM-based and non-LLM-based approaches for clear comparison.
  • Includes sections on multimodal agents, understanding, and safety, providing a holistic view.
  • Regularly updated with recent publications, indicated by recent dates on many entries.

Maintenance & Community

The repository is led by Yingqing He and Zhaoyang Liu, with contributions from a team listed by modality. The project welcomes contributions via pull requests or comments.

Licensing & Compatibility

The repository itself is a collection of links to research papers and does not have a specific software license. The licensing of the individual papers would depend on their respective publication venues.

Limitations & Caveats

This repository is a survey and does not provide code implementations or direct access to models. Users must refer to the original papers for details on specific methodologies, datasets, and performance.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
13 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Elvis Saravia Elvis Saravia(Founder of DAIR.AI).

NExT-GPT by NExT-GPT

0.1%
4k
Any-to-any multimodal LLM research paper
Created 2 years ago
Updated 4 months ago
Feedback? Help us improve.