Research paper for open-vocabulary object detection
Top 79.5% on sourcepulse
OV-DINO provides a unified approach to open-vocabulary object detection, addressing the need for flexible and accurate detection across a wide range of categories. It is designed for researchers and practitioners in computer vision and deep learning who require state-of-the-art performance in zero-shot and fine-tuned detection tasks. The project offers significant improvements over previous methods, particularly in zero-shot evaluation on challenging benchmarks like COCO and LVIS.
How It Works
OV-DINO employs a Unified Data Integration pipeline for end-to-end pre-training on diverse datasets, including Objects365, GoldG, and CC1M. A key innovation is the Language-Aware Selective Fusion module, which enhances the model's vision-language understanding by selectively fusing information based on linguistic context. This approach leads to improved zero-shot capabilities and overall detection accuracy.
Quick Start & Requirements
CUDA_HOME
if not using CUDA 11.6, create a conda environment (ovdino
), install PyTorch 1.13.1 with CUDA 11.6, and then install the project dependencies using pip install -e detectron2-717ab9
and pip install -e ./
. An optional environment (ovsam
) is provided for OV-SAM integration.inits/ovdino
.Highlighted Details
Maintenance & Community
The project is actively updated, with recent releases including pre-training code for O365 and the OV-SAM integration. The authors are responsive to issues raised for fine-tuning.
Licensing & Compatibility
The repository does not explicitly state a license in the README. However, it references other open-source projects like Detectron2, detrex, GLIP, G-DINO, and YOLO-World, suggesting a permissive open-source orientation. Compatibility for commercial use would require explicit license confirmation.
Limitations & Caveats
The project is still under active development, with several features planned, including ONNX exporting and integration into 🤗 Transformers. The pre-training code for all datasets is noted as "Coming soon." The README mentions that uploaded images for the web demo are stored for failure analysis.
4 months ago
Inactive