Awesome-CV-Foundational-Models by awaisrauf

Vision-language survey paper with curated list of foundational CV models

Created 2 years ago

542 stars

Top 58.7% on SourcePulse

Project Summary

This repository serves as a curated list of foundational models in computer vision, supplementing a survey paper on the topic. It aims to provide researchers and practitioners with a comprehensive overview of emerging vision models that leverage multimodal data and large-scale training for enhanced reasoning, generalization, and prompt capabilities.

How It Works

The repository organizes foundational models based on their architectural designs, training objectives (contrastive, generative), pre-training datasets, and prompting patterns (textual, visual, heterogeneous). It highlights models that bridge modalities like vision, text, and audio, enabling capabilities such as zero-shot learning and prompt-based manipulation of visual outputs.

Quick Start & Requirements

This repository is a collection of links and information about foundational models, not a runnable codebase itself. Users are directed to individual project pages for installation and usage instructions.

Highlighted Details

Comprehensive review of foundational models in computer vision, covering architecture, training, and prompting.
Discussion of open challenges and future research directions in the field.
Links to numerous seminal papers and their associated codebases.

Maintenance & Community

The repository is associated with a survey paper accepted for publication by TPAMI. It encourages contributions via pull requests for relevant new works.

Licensing & Compatibility

The licensing of individual models linked within this repository varies. Users should consult the specific licenses of each project.

Limitations & Caveats

This repository is a curated list and does not provide direct code execution or support. Users must refer to the individual project pages for model-specific details and functionality.

Awesome-CV-Foundational-Models by awaisrauf

Explore Similar Projects

Awesome-Remote-Sensing-Multimodal-Large-Language-Model by ZhanYang-nwpu

ml-papers by rosinality

Awesome-Prompting-on-Vision-Language-Model by JindongGu

Awesome_Matching_Pretraining_Transfering by Paranioar

awesome-vlm-architectures by gokayfem

Awesome-Foundation-Models by uncbiag

molmo by allenai

Vary by Ucas-HaoranWei

OFA by OFA-Sys

open_flamingo by mlfoundations

awesome-multimodal-ml by pliang279

unilm by microsoft