Curated list of foundation and multimodal models
Top 53.6% on sourcepulse
This repository curates foundational and multimodal AI models, serving as a comprehensive resource for researchers and developers exploring advanced AI capabilities. It provides a structured overview of cutting-edge models that integrate vision, language, and audio, enabling a wide array of downstream tasks.
How It Works
The list categorizes models by their supported modalities (vision, language, audio) and primary tasks, such as object detection, segmentation, and text-to-audio generation. Each entry includes key details like publication date, associated papers, code repositories, and example usage, facilitating quick evaluation and adoption.
Quick Start & Requirements
This is a curated list, not a runnable codebase. Users will need to refer to individual model repositories for installation and execution instructions.
Highlighted Details
Maintenance & Community
The repository is community-driven, actively seeking contributions for new models and improvements via issues and pull requests.
Licensing & Compatibility
Licensing information is not provided for the curated list itself. Users must consult the individual model repositories for their respective licenses and compatibility.
Limitations & Caveats
This is a reference list and does not provide a unified API or execution environment. Users must independently manage dependencies and setup for each model.
1 year ago
1 day