Object detection research paper for open-vocabulary scenarios
Top 90.9% on sourcepulse
This repository provides the official implementation for "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection," a NeurIPS 2022 paper. It addresses limitations in current open-vocabulary detection methods by aligning object-centric language embeddings and improving generalization to novel classes using image-level supervision. The target audience is researchers and practitioners in computer vision focused on object detection and open-vocabulary tasks.
How It Works
The approach bridges the gap between image-level and object-level representations for open-vocabulary detection. It introduces Region-based Knowledge Distillation (RKD) to adapt image-centric language embeddings (from CLIP) to be object-centric, improving localization. Additionally, Pseudo Image-level Supervision (PIS) leverages weak image-level supervision from multi-modal Vision Transformers (MAVL) to enhance generalization to novel classes via a pseudo-labeling process. A novel Weight Transfer function efficiently combines these two components, aggregating their complementary strengths for superior performance.
Quick Start & Requirements
INSTALL.md
.INSTALL.md
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
2 years ago
1 day