VLM_survey  by jingyi0000

VLM survey paper with links to models/methods for vision tasks

created 2 years ago
2,857 stars

Top 17.0% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository serves as a comprehensive survey of Vision-Language Models (VLMs) applied to various visual recognition tasks, including image classification, object detection, and semantic segmentation. It targets researchers and practitioners in computer vision and natural language processing, offering a structured overview of VLMs, their pre-training methods, transfer learning techniques, and knowledge distillation strategies. The project aims to consolidate and categorize the rapidly evolving field of VLMs for vision tasks.

How It Works

The repository is structured around a survey paper, "Vision-Language Models for Vision Tasks: A Survey," which systematically categorizes VLMs based on their application in visual recognition. It details pre-training methodologies (contrastive, generative, alignment), transfer learning approaches (prompt tuning, adapters), and knowledge distillation techniques. The survey also lists relevant datasets for both pre-training and evaluation across various vision tasks.

Quick Start & Requirements

This repository is a curated list of papers and does not have a direct installation or execution command. It requires no specific software to view.

Highlighted Details

  • Comprehensive categorization of VLM pre-training methods, transfer learning techniques, and knowledge distillation strategies.
  • Extensive lists of datasets used for VLM pre-training and evaluation across image classification, object detection, semantic segmentation, and more.
  • Includes links to papers and code repositories for each listed VLM method.
  • Features recent advancements, with many papers from NeurIPS 2024 and CVPR 2024.

Maintenance & Community

The project is maintained by jingyi0000 and welcomes contributions via pull requests for missing papers. The last update was on March 24, 2025.

Licensing & Compatibility

The repository itself does not specify a license. The linked papers and code repositories will have their own respective licenses.

Limitations & Caveats

This repository is a survey and does not provide executable code or models. Its value is in its curated information and links to external resources.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
183 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.