Awesome-Open-Vocabulary  by jianzongwu

Survey on open vocabulary learning methods (object detection, segmentation, tracking)

created 2 years ago
944 stars

Top 39.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive survey and benchmark of recent advancements in Open Vocabulary Learning (OVL), covering detection, segmentation, and video understanding tasks. It is a valuable resource for researchers and practitioners in computer vision aiming to build models that can recognize and process an open, unbounded set of visual concepts beyond predefined categories.

How It Works

The survey categorizes OVL methods based on their core techniques, such as leveraging Vision-Language Models (VLMs like CLIP), using captions as auxiliary data, generating pseudo-labels, or employing diffusion models. It meticulously tracks papers, their venues, keywords, and associated code repositories, offering a structured overview of the field's evolution and key methodologies.

Quick Start & Requirements

This repository is a survey and benchmark tracker, not a runnable codebase. It links to various research papers and their associated code implementations, which may have their own specific installation and hardware requirements (e.g., Python, PyTorch, CUDA, GPUs).

Highlighted Details

  • The first comprehensive survey dedicated to Open Vocabulary Learning across detection, segmentation, and video understanding.
  • Includes related domains like foundation model tuning and open-world detection.
  • Provides detailed results and comparisons for representative OVL approaches.
  • Features a structured categorization of methods using keywords like vlm., cap., pl., diff., and unify.

Maintenance & Community

The repository is actively maintained, with updates recorded periodically. Researchers are encouraged to contribute missing papers or suggestions via pull requests. Contact information for the authors is provided for inquiries.

Licensing & Compatibility

The repository itself does not specify a license. Individual code repositories linked within the survey will have their own licenses, which may vary and could include restrictions on commercial use.

Limitations & Caveats

Due to the rapid growth of research in this area, the survey acknowledges that it may not cover every single paper published on ArXiv. The survey was last updated to record papers by January 10, 2024, and a T-PAMI version is planned for March 2024.

Health Check
Last commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.