object-centric-ovd  by hanoonaR

Object detection research paper for open-vocabulary scenarios

created 3 years ago
294 stars

Top 90.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection," a NeurIPS 2022 paper. It addresses limitations in current open-vocabulary detection methods by aligning object-centric language embeddings and improving generalization to novel classes using image-level supervision. The target audience is researchers and practitioners in computer vision focused on object detection and open-vocabulary tasks.

How It Works

The approach bridges the gap between image-level and object-level representations for open-vocabulary detection. It introduces Region-based Knowledge Distillation (RKD) to adapt image-centric language embeddings (from CLIP) to be object-centric, improving localization. Additionally, Pseudo Image-level Supervision (PIS) leverages weak image-level supervision from multi-modal Vision Transformers (MAVL) to enhance generalization to novel classes via a pseudo-labeling process. A novel Weight Transfer function efficiently combines these two components, aggregating their complementary strengths for superior performance.

Quick Start & Requirements

  • Installation: Clone the repository and follow instructions in INSTALL.md.
  • Prerequisites: PyTorch 1.10.0, CUDA 11.3.
  • Training: Requires 8 A100 GPUs. Training times range from 4.5 hours to 2.5 days depending on the configuration.
  • Demo: An interactive Colab notebook is available for custom detector creation.
  • Documentation: Installation instructions are in INSTALL.md.

Highlighted Details

  • Achieves state-of-the-art results on COCO and LVIS benchmarks for open-vocabulary detection.
  • Demonstrates significant gains on novel classes: 40.3 AP50 on COCO (11.9 absolute gain) and 5.0 mask AP for rare categories on LVIS.
  • Ablation studies show that the proposed Weight Transfer method provides complimentary gains over naively adding RKD and PIS components.
  • Code is based on the Detic repository and utilizes the MViT model (MAVL).

Maintenance & Community

  • The paper was accepted at NeurIPS 2022.
  • Contact information for questions is provided via email. Issues can be raised on the repository.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • Training is resource-intensive, requiring multiple high-end GPUs (8xA100).
  • The README does not specify the license, which could impact commercial adoption.
Health Check
Last commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.