Curated paper list for multimodal AI research
Top 70.7% on sourcepulse
This repository serves as a curated list of resources and research papers focused on large multi-modality models (LMMM), parameter-efficient fine-tuning (PEFT), and vision-language pretraining (VLP). It aims to provide preliminary insights and a structured overview of these rapidly evolving fields for researchers and practitioners.
How It Works
The project organizes academic papers and related resources into distinct categories, including LMMM (further broken down by perception, generation, and unification), PEFT methods (like prompt tuning, adapter tuning), VLP (image-language and video-language pretraining), and conventional image-text matching techniques. This categorization facilitates a structured exploration of the landscape, highlighting key concepts, datasets, and learning paradigms.
Quick Start & Requirements
This repository is a curated list of papers and does not contain executable code. No installation or specific requirements are necessary to browse the content.
Highlighted Details
Maintenance & Community
The project is maintained by Paranioar and updates are logged, with the last update noted on 2024.12.15. Contact is available via email at r1228240468@gmail.com.
Licensing & Compatibility
The repository is released under the MIT license, permitting broad use and modification.
Limitations & Caveats
The project is a static list of papers and does not provide implementations or code. Updates to the LMMM section are ongoing and may not be fully comprehensive as of the last log entry.
7 months ago
1 day