awesome-foundation-and-multimodal-models  by SkalskiP

Curated list of foundation and multimodal models

Created 1 year ago
634 stars

Top 52.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository curates foundational and multimodal AI models, serving as a comprehensive resource for researchers and developers exploring advanced AI capabilities. It provides a structured overview of cutting-edge models that integrate vision, language, and audio, enabling a wide array of downstream tasks.

How It Works

The list categorizes models by their supported modalities (vision, language, audio) and primary tasks, such as object detection, segmentation, and text-to-audio generation. Each entry includes key details like publication date, associated papers, code repositories, and example usage, facilitating quick evaluation and adoption.

Quick Start & Requirements

This is a curated list, not a runnable codebase. Users will need to refer to individual model repositories for installation and execution instructions.

Highlighted Details

  • Covers models from late 2021 to early 2024, showcasing recent advancements.
  • Includes models with diverse modality combinations: vision-only, vision-language, and audio-language.
  • Features prominent models like YOLO-World, Depth Anything, Segment Anything, LLaVA, and Whisper.
  • Lists tasks ranging from zero-shot object detection and segmentation to image captioning and speech recognition.

Maintenance & Community

The repository is community-driven, actively seeking contributions for new models and improvements via issues and pull requests.

Licensing & Compatibility

Licensing information is not provided for the curated list itself. Users must consult the individual model repositories for their respective licenses and compatibility.

Limitations & Caveats

This is a reference list and does not provide a unified API or execution environment. Users must independently manage dependencies and setup for each model.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Simon Willison Simon Willison(Coauthor of Django), and
10 more.

LAVIS by salesforce

0.2%
11k
Library for language-vision AI research
Created 3 years ago
Updated 10 months ago
Feedback? Help us improve.