DINO-X-API  by IDEA-Research

Unified vision model for open-world object detection and understanding

created 8 months ago
1,176 stars

Top 33.0% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DINO-X is a unified vision model designed for open-world object detection and understanding, offering state-of-the-art performance on various benchmarks. It targets researchers and developers needing advanced capabilities in object detection, segmentation, and phrase grounding, enabling flexible integration into AI workflows.

How It Works

DINO-X builds upon the DINO architecture, enhancing it with multi-level semantic representations and diverse input prompt capabilities (text, visual, custom). This allows for simultaneous output of bounding boxes, segmentation masks, pose keypoints, and object captions, providing a comprehensive understanding of objects within an image. Its approach emphasizes open-set detection and efficient mask generation.

Quick Start & Requirements

  • Install via pip install dds-cloudapi-sdk --upgrade.
  • Requires an API Token obtained from the DeepDataSpace website.
  • Local demos are available via python demo.py and python prompt_free_demo.py.
  • Official documentation and examples are linked on the project's homepage.

Highlighted Details

  • Achieves SOTA zero-shot detection performance: 56.0 AP on COCO, 59.8 AP on LVIS-minival, and 52.4 AP on LVIS-val.
  • Demonstrates strong performance on rare classes (e.g., 63.3 AP on LVIS-minival rare classes).
  • Supports diverse tasks: Open-Set Detection/Segmentation, Phrase Grounding, Visual-Prompt Counting, Pose Estimation, Region Captioning.
  • Integrates with AI tools like Cursor and Claude via DINO-X MCP Server.

Maintenance & Community

The project is developed by IDEA Research. Recent updates include improved mask encoding and the release of the DINO-X MCP Server. Further details and demos can be found on the DeepDataSpace platform.

Licensing & Compatibility

Licensed under the Apache 2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While DINO-X achieves strong detection performance, its segmentation performance on benchmarks like SGinW shows a notable gap compared to models like Grounded SAM 2. The project recommends Grounded SAM 2 for simultaneous object segmentation and tracking needs.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
2
Star History
49 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.