Curated list of foundation models for vision/language tasks
Top 36.1% on sourcepulse
This repository serves as a curated, comprehensive list of foundation models for vision and language tasks, aimed at researchers and practitioners in AI. It aims to consolidate significant advancements, providing a structured overview of papers with accompanying code, thereby accelerating research and development in multimodal AI.
How It Works
The project functions as a living bibliography, meticulously cataloging research papers that introduce or significantly advance foundation models. It categorizes these models by year and topic, focusing on those with publicly available code, ensuring practical utility for the AI community. The curation emphasizes seminal works and recent breakthroughs, offering a historical perspective and a snapshot of the current state-of-the-art.
Quick Start & Requirements
This repository is a curated list and does not have direct installation or execution commands. It requires a web browser to access and navigate the listed resources.
Highlighted Details
Maintenance & Community
The repository is actively maintained, with frequent updates reflecting the rapid pace of foundation model research. It lists numerous contributing institutions and researchers from leading AI labs and universities globally. Links to related communities and resources are provided.
Licensing & Compatibility
The repository itself is typically licensed under permissive terms (e.g., MIT), allowing broad use. However, the licensing of the individual models and codebases linked within the repository varies significantly and must be checked on a per-project basis.
Limitations & Caveats
The list is a curated selection and may not be exhaustive. While it prioritizes papers with code, the availability and quality of linked code can vary. The rapid evolution of the field means some entries may quickly become dated.
1 month ago
1 day