Survey of Vision-Language Models (VLMs)
Top 90.4% on sourcepulse
This repository provides a comprehensive, frontend-curated survey of Vision-Language Models (VLMs), covering state-of-the-art models, benchmarks, post-training techniques, applications, and challenges. It serves researchers and practitioners by consolidating information on VLMs, offering a structured overview of the rapidly evolving field.
How It Works
The project acts as a curated knowledge base, meticulously organizing links to papers, GitHub repositories, and datasets. It categorizes VLMs by architecture and training data, lists evaluation benchmarks with their metrics and sources, and details post-training methods like RL alignment and prompt engineering. The structure facilitates easy navigation through the complex landscape of VLM research and development.
Quick Start & Requirements
This repository is a collection of links and does not require installation or execution. It serves as a reference guide.
Highlighted Details
Maintenance & Community
The repository is actively maintained, with papers marked with a star indicating contributions from the maintainers. Users are encouraged to contribute and discuss via the GitHub repository.
Licensing & Compatibility
The repository itself is a collection of links to external resources, each with its own licensing. The project does not impose specific licensing restrictions beyond those of the linked content.
Limitations & Caveats
As a survey, this repository does not provide executable code or models. The information is a snapshot of the field, and the rapid pace of VLM development means some details may become outdated.
3 days ago
Inactive