Survey on open vocabulary learning methods (object detection, segmentation, tracking)
Top 39.7% on sourcepulse
This repository provides a comprehensive survey and benchmark of recent advancements in Open Vocabulary Learning (OVL), covering detection, segmentation, and video understanding tasks. It is a valuable resource for researchers and practitioners in computer vision aiming to build models that can recognize and process an open, unbounded set of visual concepts beyond predefined categories.
How It Works
The survey categorizes OVL methods based on their core techniques, such as leveraging Vision-Language Models (VLMs like CLIP), using captions as auxiliary data, generating pseudo-labels, or employing diffusion models. It meticulously tracks papers, their venues, keywords, and associated code repositories, offering a structured overview of the field's evolution and key methodologies.
Quick Start & Requirements
This repository is a survey and benchmark tracker, not a runnable codebase. It links to various research papers and their associated code implementations, which may have their own specific installation and hardware requirements (e.g., Python, PyTorch, CUDA, GPUs).
Highlighted Details
vlm.
, cap.
, pl.
, diff.
, and unify
.Maintenance & Community
The repository is actively maintained, with updates recorded periodically. Researchers are encouraged to contribute missing papers or suggestions via pull requests. Contact information for the authors is provided for inquiries.
Licensing & Compatibility
The repository itself does not specify a license. Individual code repositories linked within the survey will have their own licenses, which may vary and could include restrictions on commercial use.
Limitations & Caveats
Due to the rapid growth of research in this area, the survey acknowledges that it may not cover every single paper published on ArXiv. The survey was last updated to record papers by January 10, 2024, and a T-PAMI version is planned for March 2024.
4 months ago
1 week