Discover and explore top open-source AI tools and projects—updated daily.
wanghao9610Research paper for open-vocabulary object detection
Top 73.8% on SourcePulse
OV-DINO provides a unified approach to open-vocabulary object detection, addressing the need for flexible and accurate detection across a wide range of categories. It is designed for researchers and practitioners in computer vision and deep learning who require state-of-the-art performance in zero-shot and fine-tuned detection tasks. The project offers significant improvements over previous methods, particularly in zero-shot evaluation on challenging benchmarks like COCO and LVIS.
How It Works
OV-DINO employs a Unified Data Integration pipeline for end-to-end pre-training on diverse datasets, including Objects365, GoldG, and CC1M. A key innovation is the Language-Aware Selective Fusion module, which enhances the model's vision-language understanding by selectively fusing information based on linguistic context. This approach leads to improved zero-shot capabilities and overall detection accuracy.
Quick Start & Requirements
CUDA_HOME if not using CUDA 11.6, create a conda environment (ovdino), install PyTorch 1.13.1 with CUDA 11.6, and then install the project dependencies using pip install -e detectron2-717ab9 and pip install -e ./. An optional environment (ovsam) is provided for OV-SAM integration.inits/ovdino.Highlighted Details
Maintenance & Community
The project is actively updated, with recent releases including pre-training code for O365 and the OV-SAM integration. The authors are responsive to issues raised for fine-tuning.
Licensing & Compatibility
The repository does not explicitly state a license in the README. However, it references other open-source projects like Detectron2, detrex, GLIP, G-DINO, and YOLO-World, suggesting a permissive open-source orientation. Compatibility for commercial use would require explicit license confirmation.
Limitations & Caveats
The project is still under active development, with several features planned, including ONNX exporting and integration into 🤗 Transformers. The pre-training code for all datasets is noted as "Coming soon." The README mentions that uploaded images for the web demo are stored for failure analysis.
10 months ago
Inactive
pzzhang
facebookresearch
mlfoundations
IDEA-Research