Awesome-CV-Foundational-Models  by awaisrauf

Vision-language survey paper with curated list of foundational CV models

Created 2 years ago
529 stars

Top 59.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository serves as a curated list of foundational models in computer vision, supplementing a survey paper on the topic. It aims to provide researchers and practitioners with a comprehensive overview of emerging vision models that leverage multimodal data and large-scale training for enhanced reasoning, generalization, and prompt capabilities.

How It Works

The repository organizes foundational models based on their architectural designs, training objectives (contrastive, generative), pre-training datasets, and prompting patterns (textual, visual, heterogeneous). It highlights models that bridge modalities like vision, text, and audio, enabling capabilities such as zero-shot learning and prompt-based manipulation of visual outputs.

Quick Start & Requirements

This repository is a collection of links and information about foundational models, not a runnable codebase itself. Users are directed to individual project pages for installation and usage instructions.

Highlighted Details

  • Comprehensive review of foundational models in computer vision, covering architecture, training, and prompting.
  • Discussion of open challenges and future research directions in the field.
  • Links to numerous seminal papers and their associated codebases.

Maintenance & Community

The repository is associated with a survey paper accepted for publication by TPAMI. It encourages contributions via pull requests for relevant new works.

Licensing & Compatibility

The licensing of individual models linked within this repository varies. Users should consult the specific licenses of each project.

Limitations & Caveats

This repository is a curated list and does not provide direct code execution or support. Users must refer to the individual project pages for model-specific details and functionality.

Health Check
Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.