Awesome-Multimodal-Large-Language-Models by yfzhang114

Collection of resources for multimodal LLMs, LLMs, and diffusion models

Created 2 years ago

1,176 stars

Top 32.3% on SourcePulse

Project Summary

This repository serves as a curated collection of reading notes and resources for Multimodal Large Language Models (MLLMs), Large Language Models (LLMs), and Diffusion Models. It is primarily aimed at researchers and engineers interested in the latest advancements, benchmarks, and alignment techniques in the MLLM space, offering a structured overview of key papers and concepts.

How It Works

The repository organizes information by topic, including surveys, specific MLLM architectures, benchmarks, datasets, and alignment strategies (like RLHF and DPO). It links to numerous research papers, often with accompanying reading notes, providing a deep dive into the technical details, methodologies, and claimed improvements of various models and techniques. The content is regularly updated with new research.

Quick Start & Requirements

This repository is a collection of curated links and notes, not a runnable software package. Accessing the linked code repositories or datasets will have their own specific installation and hardware requirements, often including significant GPU resources and specific software dependencies.

Highlighted Details

Comprehensive surveys on MLLM evaluation and RLHF for MLLMs.
Introductions to key MLLM architectures like LLaVA, InternVL, Qwen-VL, and CogVLM.
Datasets and benchmarks such as MME-RealWorld, MMMU-Pro, and OBELICS.
Detailed explanations of alignment techniques including DPO, RLHF, and RLAIF-V.
Focus on high-resolution image processing and efficient multimodal handling.

Maintenance & Community

The repository is maintained by yfzhang114, a Ph.D. student with research experience at Microsoft and Alibaba. Updates are frequent, with recent additions focusing on MLLM evaluation benchmarks and alignment surveys. Links to personal homepages and contact information are provided for collaboration.

Licensing & Compatibility

The repository itself is a collection of links and notes, and does not have a specific license. The licensing of the linked code repositories and datasets will vary and must be checked individually.

Limitations & Caveats

As a curated list of resources, this repository does not provide direct functionality. Users must navigate to external links for code, datasets, and further details, each with its own potential setup complexities and compatibility requirements. The sheer volume of information may require significant effort to fully digest.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

29 stars in the last 30 days