GroundingDINO by IDEA-Research

Object detection via grounded pre-training research paper

Created 2 years ago

9,542 stars

Top 5.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Luis Capelo

Cofounder of Lightning AI

Project Summary

Grounding DINO is an open-source PyTorch implementation for open-set object detection, enabling users to detect any object specified by natural language prompts. It is designed for researchers and developers working on advanced computer vision tasks, offering high performance and flexibility for applications like image editing and automated annotation.

How It Works

Grounding DINO integrates the DINO (DETR with Improved deNoising Anchor) object detection framework with grounded pre-training. This approach allows it to understand and localize objects based on textual descriptions, achieving strong zero-shot performance by leveraging a text backbone, image backbone, feature enhancer, language-guided query selection, and a cross-modality decoder.

Quick Start & Requirements

Install: pip install -e . within the cloned repository.
Prerequisites: Python, PyTorch. CUDA is recommended for GPU acceleration; CPU-only mode is supported. Ensure CUDA_HOME is set correctly if using CUDA.
Model Weights: Download groundingdino_swint_ogc.pth from the releases page.
Demo: Run inference with CUDA_VISIBLE_DEVICES={GPU ID} python demo/inference_on_a_image.py -c groundingdino/config/GroundingDINO_SwinT_OGC.py -p weights/groundingdino_swint_ogc.pth -i image_you_want_to_detect.jpg -o "output_dir" -t "your text prompt".
Resources: Requires downloading pre-trained model weights.
Links: Paper, Demo, Colab Demo

Highlighted Details

Achieves 52.5 AP on COCO zero-shot (without COCO data) and 63.0 AP when fine-tuned.
Integrates with Stable Diffusion and GLIGEN for controllable image editing.
Supports CPU-only inference.
Offers a Gradio Web UI demo.

Maintenance & Community

The project is actively maintained by IDEA-Research and IDEA-CVR. Related projects like Grounded-SAM and Semantic-SAM are also available. Community support channels are not explicitly listed, but the project is associated with the authors' research group.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, it is common for research implementations to be for non-commercial use unless otherwise specified. Compatibility for commercial use or closed-source linking should be verified.

Limitations & Caveats

Training code is not yet released. The README notes potential NameError: name '_C' is not defined if installation steps are not followed strictly, requiring re-cloning and reinstallation. The COCO zero-shot evaluation result mentioned in the README (48.5) differs from the claimed benchmark (52.5 AP).

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

135 stars in the last 30 days