Awesome-Multimodal-LLM  by HenryHZY

Collection of research trends in LLM-guided multimodal learning

created 2 years ago
357 stars

Top 79.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository serves as a curated list of research trends and papers in LLM-guided multimodal learning, focusing on integrating text, vision, and audio modalities. It targets researchers and practitioners in AI and NLP seeking to understand the landscape of multimodal large language models, their architectures, and evaluation methods. The primary benefit is a comprehensive overview of the rapidly evolving field, providing pointers to key models and techniques.

How It Works

The project categorizes research based on LLM backbones (e.g., LLaMA, Vicuna, OPT, T5) and learning techniques (e.g., full fine-tuning, LoRA, in-context learning). It highlights models that leverage LLMs to process and generate multimodal content, often by combining a frozen vision encoder with a language model backbone. This approach allows LLMs to understand and reason about visual or other non-textual data, enabling new capabilities in areas like visual question answering and image captioning.

Quick Start & Requirements

This is a curated list of research papers and does not have a direct installation or execution command. The "code" links provided for each paper point to individual project repositories, which will have their own specific setup instructions and dependencies (e.g., Python, PyTorch, specific LLM checkpoints, CUDA).

Highlighted Details

  • Comprehensive listing of multimodal LLM research from 2021 to August 2023.
  • Covers a wide range of LLM backbones and learning techniques.
  • Includes examples of both multimodal LLM models (e.g., MiniGPT-4, LLaVA) and evaluation methods (e.g., POPE, MultiInstruct).
  • Provides links to papers and code for each listed research contribution.

Maintenance & Community

The repository is open for contributions via pull requests, encouraging community updates on new research and trends. Users can also tag a Twitter handle for interesting news.

Licensing & Compatibility

The repository itself is a list and does not impose a license. Individual code repositories linked within will have their own licenses, which may vary and could include restrictions on commercial use.

Limitations & Caveats

This resource is a snapshot of research trends and may not include the absolute latest developments. The "code" links point to external repositories, each with its own setup complexity and potential maintenance status.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.