awesome-foundation-and-multimodal-models by SkalskiP

Curated list of foundation and multimodal models

Created 2 years ago

637 stars

Top 51.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Omar Sanseviero

DevRel at Google DeepMind

Andreas Jansson

Cofounder of Replicate

Project Summary

This repository curates foundational and multimodal AI models, serving as a comprehensive resource for researchers and developers exploring advanced AI capabilities. It provides a structured overview of cutting-edge models that integrate vision, language, and audio, enabling a wide array of downstream tasks.

How It Works

The list categorizes models by their supported modalities (vision, language, audio) and primary tasks, such as object detection, segmentation, and text-to-audio generation. Each entry includes key details like publication date, associated papers, code repositories, and example usage, facilitating quick evaluation and adoption.

Quick Start & Requirements

This is a curated list, not a runnable codebase. Users will need to refer to individual model repositories for installation and execution instructions.

Highlighted Details

Covers models from late 2021 to early 2024, showcasing recent advancements.
Includes models with diverse modality combinations: vision-only, vision-language, and audio-language.
Features prominent models like YOLO-World, Depth Anything, Segment Anything, LLaVA, and Whisper.
Lists tasks ranging from zero-shot object detection and segmentation to image captioning and speech recognition.

Maintenance & Community

The repository is community-driven, actively seeking contributions for new models and improvements via issues and pull requests.

Licensing & Compatibility

Licensing information is not provided for the curated list itself. Users must consult the individual model repositories for their respective licenses and compatibility.

Limitations & Caveats

This is a reference list and does not provide a unified API or execution environment. Users must independently manage dependencies and setup for each model.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days