Grounded-Segment-Anything  by IDEA-Research

Framework for open-world visual tasks, combining multiple models

Created 2 years ago
16,920 stars

Top 2.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a powerful pipeline for open-world object detection and segmentation, combining state-of-the-art models like Grounding DINO and Segment Anything (SAM). It targets researchers and developers needing flexible visual task solutions, enabling text-guided detection, segmentation, and even image generation/editing.

How It Works

The core approach chains Grounding DINO for text-based object detection (outputting bounding boxes and class labels) with SAM for precise segmentation of detected objects. This modular design allows for easy integration of other models, such as Stable Diffusion for inpainting or RAM/Tag2Text for automatic image labeling, creating versatile workflows for complex visual tasks.

Quick Start & Requirements

  • Install: pip install -e . (after cloning and setting up dependencies). Docker installation is also provided.
  • Prerequisites: Python >= 3.8, PyTorch >= 1.7, TorchVision >= 0.8. CUDA support is strongly recommended.
  • Resources: Requires downloading pre-trained weights for Grounding DINO and SAM (e.g., sam_vit_h_4b8939.pth).
  • Docs: Grounded SAM

Highlighted Details

  • Integrates with SAM-HQ for higher quality segmentation.
  • Supports inpainting tasks by combining detection, segmentation, and Stable Diffusion.
  • Enables automatic labeling pipelines using models like RAM, Tag2Text, and BLIP.
  • Extends to audio-driven segmentation via Whisper.
  • Offers integrations for 3D mesh recovery (OSX) and object tracking (VISAM).

Maintenance & Community

The project is actively developed by IDEA Research, with frequent updates and numerous community extensions highlighted. Links to Huggingface demos and a technical report on arXiv are available.

Licensing & Compatibility

The project's components are derived from other open-source projects, each with its own license. Grounding DINO and SAM are typically permissive (e.g., MIT, Apache 2.0), allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

While highly versatile, the setup involves managing multiple large model checkpoints. Some advanced features, like those involving ChatGPT, require API keys and may incur costs. The project is a research endeavor, and stability for production use may vary.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
141 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jinze Bai Jinze Bai(Research Scientist at Alibaba Qwen), and
4 more.

self-operating-computer by OthersideAI

0.1%
10k
Framework for multimodal computer operation
Created 1 year ago
Updated 4 months ago
Feedback? Help us improve.