Awesome-Open-Vocabulary by jianzongwu

Survey on open vocabulary learning methods (object detection, segmentation, tracking)

Created 2 years ago

979 stars

Top 37.7% on SourcePulse

Project Summary

This repository provides a comprehensive survey and benchmark of recent advancements in Open Vocabulary Learning (OVL), covering detection, segmentation, and video understanding tasks. It is a valuable resource for researchers and practitioners in computer vision aiming to build models that can recognize and process an open, unbounded set of visual concepts beyond predefined categories.

How It Works

The survey categorizes OVL methods based on their core techniques, such as leveraging Vision-Language Models (VLMs like CLIP), using captions as auxiliary data, generating pseudo-labels, or employing diffusion models. It meticulously tracks papers, their venues, keywords, and associated code repositories, offering a structured overview of the field's evolution and key methodologies.

Quick Start & Requirements

This repository is a survey and benchmark tracker, not a runnable codebase. It links to various research papers and their associated code implementations, which may have their own specific installation and hardware requirements (e.g., Python, PyTorch, CUDA, GPUs).

Highlighted Details

The first comprehensive survey dedicated to Open Vocabulary Learning across detection, segmentation, and video understanding.
Includes related domains like foundation model tuning and open-world detection.
Provides detailed results and comparisons for representative OVL approaches.
Features a structured categorization of methods using keywords like vlm., cap., pl., diff., and unify.

Maintenance & Community

The repository is actively maintained, with updates recorded periodically. Researchers are encouraged to contribute missing papers or suggestions via pull requests. Contact information for the authors is provided for inquiries.

Licensing & Compatibility

The repository itself does not specify a license. Individual code repositories linked within the survey will have their own licenses, which may vary and could include restrictions on commercial use.

Limitations & Caveats

Due to the rapid growth of research in this area, the survey acknowledges that it may not cover every single paper published on ArXiv. The survey was last updated to record papers by January 10, 2024, and a T-PAMI version is planned for March 2024.

Awesome-Open-Vocabulary by jianzongwu

Explore Similar Projects

Forge_VFM4AD by zhanghm1995

vlmrun-cookbook by vlm-run

ml-papers by rosinality

awesome-described-object-detection by Charles-Xie

OV-DINO by wanghao9610

Awesome-Open-Vocabulary-Semantic-Segmentation by Qinying-Liu

Awesome-CV-Foundational-Models by awaisrauf

awesome-foundation-and-multimodal-models by SkalskiP

Awesome-Foundation-Models by uncbiag

Visual-RFT by Liuziyu77

Vary by Ucas-HaoranWei

notebooks by roboflow