GroundingDINO  by IDEA-Research

Object detection via grounded pre-training research paper

created 2 years ago
8,580 stars

Top 6.0% on sourcepulse

GitHubView on GitHub
Project Summary

Grounding DINO is an open-source PyTorch implementation for open-set object detection, enabling users to detect any object specified by natural language prompts. It is designed for researchers and developers working on advanced computer vision tasks, offering high performance and flexibility for applications like image editing and automated annotation.

How It Works

Grounding DINO integrates the DINO (DETR with Improved deNoising Anchor) object detection framework with grounded pre-training. This approach allows it to understand and localize objects based on textual descriptions, achieving strong zero-shot performance by leveraging a text backbone, image backbone, feature enhancer, language-guided query selection, and a cross-modality decoder.

Quick Start & Requirements

  • Install: pip install -e . within the cloned repository.
  • Prerequisites: Python, PyTorch. CUDA is recommended for GPU acceleration; CPU-only mode is supported. Ensure CUDA_HOME is set correctly if using CUDA.
  • Model Weights: Download groundingdino_swint_ogc.pth from the releases page.
  • Demo: Run inference with CUDA_VISIBLE_DEVICES={GPU ID} python demo/inference_on_a_image.py -c groundingdino/config/GroundingDINO_SwinT_OGC.py -p weights/groundingdino_swint_ogc.pth -i image_you_want_to_detect.jpg -o "output_dir" -t "your text prompt".
  • Resources: Requires downloading pre-trained model weights.
  • Links: Paper, Demo, Colab Demo

Highlighted Details

  • Achieves 52.5 AP on COCO zero-shot (without COCO data) and 63.0 AP when fine-tuned.
  • Integrates with Stable Diffusion and GLIGEN for controllable image editing.
  • Supports CPU-only inference.
  • Offers a Gradio Web UI demo.

Maintenance & Community

The project is actively maintained by IDEA-Research and IDEA-CVR. Related projects like Grounded-SAM and Semantic-SAM are also available. Community support channels are not explicitly listed, but the project is associated with the authors' research group.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, it is common for research implementations to be for non-commercial use unless otherwise specified. Compatibility for commercial use or closed-source linking should be verified.

Limitations & Caveats

Training code is not yet released. The README notes potential NameError: name '_C' is not defined if installation steps are not followed strictly, requiring re-cloning and reinstallation. The COCO zero-shot evaluation result mentioned in the README (48.5) differs from the claimed benchmark (52.5 AP).

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
4
Star History
658 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.