awesome-multimodal-ml by pliang279

Curated reading list for multimodal ML research

Created 6 years ago

6,786 stars

Top 7.5% on SourcePulse

View on GitHub

5 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

John Yang

Coauthor of SWE-bench, SWE-agent

Chenlin Meng

Cofounder of Pika

Binyuan Hui

Research Scientist at Alibaba Qwen

and 1 more!

Project Summary

This repository is a curated reading list for multimodal machine learning, serving researchers, engineers, and students interested in the intersection of different data modalities like vision, language, and audio. It provides a structured overview of core concepts, architectures, applications, and datasets, aiming to guide users through the rapidly evolving field.

How It Works

The list is organized thematically, covering foundational areas such as multimodal representations, fusion, alignment, and pretraining, alongside advanced topics like generative learning, bias analysis, and human-in-the-loop systems. It links to seminal papers, recent advancements, relevant datasets, and influential courses, offering a comprehensive knowledge base.

Quick Start & Requirements

This is a reading list, not a software library. No installation or specific requirements are needed beyond a web browser and an interest in the field. Links to official course materials, tutorials, and datasets are provided within the list for deeper exploration.

Highlighted Details

Extensive coverage of core multimodal ML areas: representations, fusion, alignment, pretraining, translation, retrieval, co-learning, and handling missing modalities.
Detailed sections on architectures (Transformers, Memory Networks), applications (VQA, grounding, navigation, translation, dialogue), and datasets.
Includes links to numerous survey papers, influential research papers with code, and academic courses from leading institutions.
Features sections on emerging topics like bias/fairness, interpretability, commonsense reasoning, and multimodal reinforcement learning.

Maintenance & Community

The list is maintained by Paul Liang, with contributions welcomed from the community. It references resources from major conferences (CVPR, NeurIPS, ACL, etc.) and academic institutions (CMU, Stanford, MIT), indicating strong ties to the research community.

Licensing & Compatibility

As a reading list, it does not have a software license. The linked resources are subject to their respective licenses.

Limitations & Caveats

The list is a snapshot of the field and may not include the absolute latest papers published after its last update. It is a curated list, and the selection of papers reflects the curator's perspective.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

40 stars in the last 30 days