DINO-X-API  by IDEA-Research

Unified vision model for open-world object detection and understanding

Created 1 year ago
1,345 stars

Top 29.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DINO-X is a unified vision model designed for open-world object detection and understanding, offering state-of-the-art performance on various benchmarks. It targets researchers and developers needing advanced capabilities in object detection, segmentation, and phrase grounding, enabling flexible integration into AI workflows.

How It Works

DINO-X builds upon the DINO architecture, enhancing it with multi-level semantic representations and diverse input prompt capabilities (text, visual, custom). This allows for simultaneous output of bounding boxes, segmentation masks, pose keypoints, and object captions, providing a comprehensive understanding of objects within an image. Its approach emphasizes open-set detection and efficient mask generation.

Quick Start & Requirements

  • Install via pip install dds-cloudapi-sdk --upgrade.
  • Requires an API Token obtained from the DeepDataSpace website.
  • Local demos are available via python demo.py and python prompt_free_demo.py.
  • Official documentation and examples are linked on the project's homepage.

Highlighted Details

  • Achieves SOTA zero-shot detection performance: 56.0 AP on COCO, 59.8 AP on LVIS-minival, and 52.4 AP on LVIS-val.
  • Demonstrates strong performance on rare classes (e.g., 63.3 AP on LVIS-minival rare classes).
  • Supports diverse tasks: Open-Set Detection/Segmentation, Phrase Grounding, Visual-Prompt Counting, Pose Estimation, Region Captioning.
  • Integrates with AI tools like Cursor and Claude via DINO-X MCP Server.

Maintenance & Community

The project is developed by IDEA Research. Recent updates include improved mask encoding and the release of the DINO-X MCP Server. Further details and demos can be found on the DeepDataSpace platform.

Licensing & Compatibility

Licensed under the Apache 2.0 license, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

While DINO-X achieves strong detection performance, its segmentation performance on benchmarks like SGinW shows a notable gap compared to models like Grounded SAM 2. The project recommends Grounded SAM 2 for simultaneous object segmentation and tracking needs.

Health Check
Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Simon Willison Simon Willison(Coauthor of Django), and
10 more.

LAVIS by salesforce

0.1%
11k
Library for language-vision AI research
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.