OV-DINO  by wanghao9610

Research paper for open-vocabulary object detection

created 1 year ago
356 stars

Top 79.5% on sourcepulse

GitHubView on GitHub
Project Summary

OV-DINO provides a unified approach to open-vocabulary object detection, addressing the need for flexible and accurate detection across a wide range of categories. It is designed for researchers and practitioners in computer vision and deep learning who require state-of-the-art performance in zero-shot and fine-tuned detection tasks. The project offers significant improvements over previous methods, particularly in zero-shot evaluation on challenging benchmarks like COCO and LVIS.

How It Works

OV-DINO employs a Unified Data Integration pipeline for end-to-end pre-training on diverse datasets, including Objects365, GoldG, and CC1M. A key innovation is the Language-Aware Selective Fusion module, which enhances the model's vision-language understanding by selectively fusing information based on linguistic context. This approach leads to improved zero-shot capabilities and overall detection accuracy.

Quick Start & Requirements

  • Installation: Clone the repository, set CUDA_HOME if not using CUDA 11.6, create a conda environment (ovdino), install PyTorch 1.13.1 with CUDA 11.6, and then install the project dependencies using pip install -e detectron2-717ab9 and pip install -e ./. An optional environment (ovsam) is provided for OV-SAM integration.
  • Data: Requires downloading and organizing COCO, LVIS, and Objects365 datasets. Symbolic links are used to manage data paths.
  • Pre-trained Models: Download from the Model Zoo and place in inits/ovdino.
  • Resources: Evaluation on LVIS Val requires approximately 250GB of memory. Pre-training on Objects365 is demonstrated on 2 nodes with 8 A100 GPUs each.
  • Links: Paper, HuggingFace, Demo.

Highlighted Details

  • Achieves state-of-the-art zero-shot performance, with relative improvements of +2.5% AP on COCO and +12.7% AP on LVIS compared to G-DINO.
  • Offers fine-tuning code for custom datasets and pre-training code for the O365 dataset.
  • Includes local inference and web inference demos for easy deployment and testing.
  • Integrates with SAM2 for enhanced segmentation capabilities (OV-SAM).

Maintenance & Community

The project is actively updated, with recent releases including pre-training code for O365 and the OV-SAM integration. The authors are responsive to issues raised for fine-tuning.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, it references other open-source projects like Detectron2, detrex, GLIP, G-DINO, and YOLO-World, suggesting a permissive open-source orientation. Compatibility for commercial use would require explicit license confirmation.

Limitations & Caveats

The project is still under active development, with several features planned, including ONNX exporting and integration into 🤗 Transformers. The pre-training code for all datasets is noted as "Coming soon." The README mentions that uploaded images for the web demo are stored for failure analysis.

Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
47 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.