Vision-Language-Models-Overview  by zli12321

Survey of Vision-Language Models (VLMs)

created 8 months ago
297 stars

Top 90.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive, frontend-curated survey of Vision-Language Models (VLMs), covering state-of-the-art models, benchmarks, post-training techniques, applications, and challenges. It serves researchers and practitioners by consolidating information on VLMs, offering a structured overview of the rapidly evolving field.

How It Works

The project acts as a curated knowledge base, meticulously organizing links to papers, GitHub repositories, and datasets. It categorizes VLMs by architecture and training data, lists evaluation benchmarks with their metrics and sources, and details post-training methods like RL alignment and prompt engineering. The structure facilitates easy navigation through the complex landscape of VLM research and development.

Quick Start & Requirements

This repository is a collection of links and does not require installation or execution. It serves as a reference guide.

Highlighted Details

  • Comprehensive tables detail over 30 state-of-the-art VLMs, including their architectures, training data, and parameter counts.
  • An extensive list of over 50 benchmark datasets and simulators covers diverse VLM evaluation tasks, from visual reasoning to embodied AI.
  • Detailed sections on post-training methods highlight Reinforcement Learning (RL) alignment techniques and prompt engineering strategies.
  • Applications are categorized across robotics, embodied AI, generative visual media, and human-centered AI, showcasing real-world VLM use cases.

Maintenance & Community

The repository is actively maintained, with papers marked with a star indicating contributions from the maintainers. Users are encouraged to contribute and discuss via the GitHub repository.

Licensing & Compatibility

The repository itself is a collection of links to external resources, each with its own licensing. The project does not impose specific licensing restrictions beyond those of the linked content.

Limitations & Caveats

As a survey, this repository does not provide executable code or models. The information is a snapshot of the field, and the rapid pace of VLM development means some details may become outdated.

Health Check
Last commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
118 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.