Curated reading list for multimodal ML research
Top 7.9% on sourcepulse
This repository is a curated reading list for multimodal machine learning, serving researchers, engineers, and students interested in the intersection of different data modalities like vision, language, and audio. It provides a structured overview of core concepts, architectures, applications, and datasets, aiming to guide users through the rapidly evolving field.
How It Works
The list is organized thematically, covering foundational areas such as multimodal representations, fusion, alignment, and pretraining, alongside advanced topics like generative learning, bias analysis, and human-in-the-loop systems. It links to seminal papers, recent advancements, relevant datasets, and influential courses, offering a comprehensive knowledge base.
Quick Start & Requirements
This is a reading list, not a software library. No installation or specific requirements are needed beyond a web browser and an interest in the field. Links to official course materials, tutorials, and datasets are provided within the list for deeper exploration.
Highlighted Details
Maintenance & Community
The list is maintained by Paul Liang, with contributions welcomed from the community. It references resources from major conferences (CVPR, NeurIPS, ACL, etc.) and academic institutions (CMU, Stanford, MIT), indicating strong ties to the research community.
Licensing & Compatibility
As a reading list, it does not have a software license. The linked resources are subject to their respective licenses.
Limitations & Caveats
The list is a snapshot of the field and may not include the absolute latest papers published after its last update. It is a curated list, and the selection of papers reflects the curator's perspective.
11 months ago
Inactive