DINO-X-API by IDEA-Research

Unified vision model for open-world object detection and understanding

Created 1 year ago

1,345 stars

Top 29.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jesse Clark

Cofounder of Marqo

Project Summary

DINO-X is a unified vision model designed for open-world object detection and understanding, offering state-of-the-art performance on various benchmarks. It targets researchers and developers needing advanced capabilities in object detection, segmentation, and phrase grounding, enabling flexible integration into AI workflows.

How It Works

DINO-X builds upon the DINO architecture, enhancing it with multi-level semantic representations and diverse input prompt capabilities (text, visual, custom). This allows for simultaneous output of bounding boxes, segmentation masks, pose keypoints, and object captions, providing a comprehensive understanding of objects within an image. Its approach emphasizes open-set detection and efficient mask generation.

Quick Start & Requirements

Install via pip install dds-cloudapi-sdk --upgrade.
Requires an API Token obtained from the DeepDataSpace website.
Local demos are available via python demo.py and python prompt_free_demo.py.
Official documentation and examples are linked on the project's homepage.

Highlighted Details

Achieves SOTA zero-shot detection performance: 56.0 AP on COCO, 59.8 AP on LVIS-minival, and 52.4 AP on LVIS-val.
Demonstrates strong performance on rare classes (e.g., 63.3 AP on LVIS-minival rare classes).
Supports diverse tasks: Open-Set Detection/Segmentation, Phrase Grounding, Visual-Prompt Counting, Pose Estimation, Region Captioning.
Integrates with AI tools like Cursor and Claude via DINO-X MCP Server.

Maintenance & Community

The project is developed by IDEA Research. Recent updates include improved mask encoding and the release of the DINO-X MCP Server. Further details and demos can be found on the DeepDataSpace platform.

Licensing & Compatibility

Licensed under the Apache 2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While DINO-X achieves strong detection performance, its segmentation performance on benchmarks like SGinW shows a notable gap compared to models like Grounded SAM 2. The project recommends Grounded SAM 2 for simultaneous object segmentation and tracking needs.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

15 stars in the last 30 days