GroundingDINO  by IDEA-Research

Object detection via grounded pre-training research paper

Created 2 years ago
8,902 stars

Top 5.7% on SourcePulse

GitHubView on GitHub
Project Summary

Grounding DINO is an open-source PyTorch implementation for open-set object detection, enabling users to detect any object specified by natural language prompts. It is designed for researchers and developers working on advanced computer vision tasks, offering high performance and flexibility for applications like image editing and automated annotation.

How It Works

Grounding DINO integrates the DINO (DETR with Improved deNoising Anchor) object detection framework with grounded pre-training. This approach allows it to understand and localize objects based on textual descriptions, achieving strong zero-shot performance by leveraging a text backbone, image backbone, feature enhancer, language-guided query selection, and a cross-modality decoder.

Quick Start & Requirements

  • Install: pip install -e . within the cloned repository.
  • Prerequisites: Python, PyTorch. CUDA is recommended for GPU acceleration; CPU-only mode is supported. Ensure CUDA_HOME is set correctly if using CUDA.
  • Model Weights: Download groundingdino_swint_ogc.pth from the releases page.
  • Demo: Run inference with CUDA_VISIBLE_DEVICES={GPU ID} python demo/inference_on_a_image.py -c groundingdino/config/GroundingDINO_SwinT_OGC.py -p weights/groundingdino_swint_ogc.pth -i image_you_want_to_detect.jpg -o "output_dir" -t "your text prompt".
  • Resources: Requires downloading pre-trained model weights.
  • Links: Paper, Demo, Colab Demo

Highlighted Details

  • Achieves 52.5 AP on COCO zero-shot (without COCO data) and 63.0 AP when fine-tuned.
  • Integrates with Stable Diffusion and GLIGEN for controllable image editing.
  • Supports CPU-only inference.
  • Offers a Gradio Web UI demo.

Maintenance & Community

The project is actively maintained by IDEA-Research and IDEA-CVR. Related projects like Grounded-SAM and Semantic-SAM are also available. Community support channels are not explicitly listed, but the project is associated with the authors' research group.

Licensing & Compatibility

The repository does not explicitly state a license in the README. However, it is common for research implementations to be for non-commercial use unless otherwise specified. Compatibility for commercial use or closed-source linking should be verified.

Limitations & Caveats

Training code is not yet released. The README notes potential NameError: name '_C' is not defined if installation steps are not followed strictly, requiring re-cloning and reinstallation. The COCO zero-shot evaluation result mentioned in the README (48.5) differs from the claimed benchmark (52.5 AP).

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
3
Star History
207 stars in the last 30 days

Explore Similar Projects

Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

Otter by EvolvingLMMs-Lab

0.0%
3k
Multimodal model for improved instruction following and in-context learning
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Kevin Hou Kevin Hou(Head of Product Engineering at Windsurf).

ImageAI by OlafenwaMoses

0.0%
9k
Python library for computer vision tasks
Created 7 years ago
Updated 1 year ago
Feedback? Help us improve.