VLM survey paper with links to models/methods for vision tasks
Top 17.0% on sourcepulse
This repository serves as a comprehensive survey of Vision-Language Models (VLMs) applied to various visual recognition tasks, including image classification, object detection, and semantic segmentation. It targets researchers and practitioners in computer vision and natural language processing, offering a structured overview of VLMs, their pre-training methods, transfer learning techniques, and knowledge distillation strategies. The project aims to consolidate and categorize the rapidly evolving field of VLMs for vision tasks.
How It Works
The repository is structured around a survey paper, "Vision-Language Models for Vision Tasks: A Survey," which systematically categorizes VLMs based on their application in visual recognition. It details pre-training methodologies (contrastive, generative, alignment), transfer learning approaches (prompt tuning, adapters), and knowledge distillation techniques. The survey also lists relevant datasets for both pre-training and evaluation across various vision tasks.
Quick Start & Requirements
This repository is a curated list of papers and does not have a direct installation or execution command. It requires no specific software to view.
Highlighted Details
Maintenance & Community
The project is maintained by jingyi0000 and welcomes contributions via pull requests for missing papers. The last update was on March 24, 2025.
Licensing & Compatibility
The repository itself does not specify a license. The linked papers and code repositories will have their own respective licenses.
Limitations & Caveats
This repository is a survey and does not provide executable code or models. Its value is in its curated information and links to external resources.
2 months ago
1 day