Grounded-Segment-Anything  by IDEA-Research

Framework for open-world visual tasks, combining multiple models

created 2 years ago
16,697 stars

Top 2.8% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a powerful pipeline for open-world object detection and segmentation, combining state-of-the-art models like Grounding DINO and Segment Anything (SAM). It targets researchers and developers needing flexible visual task solutions, enabling text-guided detection, segmentation, and even image generation/editing.

How It Works

The core approach chains Grounding DINO for text-based object detection (outputting bounding boxes and class labels) with SAM for precise segmentation of detected objects. This modular design allows for easy integration of other models, such as Stable Diffusion for inpainting or RAM/Tag2Text for automatic image labeling, creating versatile workflows for complex visual tasks.

Quick Start & Requirements

  • Install: pip install -e . (after cloning and setting up dependencies). Docker installation is also provided.
  • Prerequisites: Python >= 3.8, PyTorch >= 1.7, TorchVision >= 0.8. CUDA support is strongly recommended.
  • Resources: Requires downloading pre-trained weights for Grounding DINO and SAM (e.g., sam_vit_h_4b8939.pth).
  • Docs: Grounded SAM

Highlighted Details

  • Integrates with SAM-HQ for higher quality segmentation.
  • Supports inpainting tasks by combining detection, segmentation, and Stable Diffusion.
  • Enables automatic labeling pipelines using models like RAM, Tag2Text, and BLIP.
  • Extends to audio-driven segmentation via Whisper.
  • Offers integrations for 3D mesh recovery (OSX) and object tracking (VISAM).

Maintenance & Community

The project is actively developed by IDEA Research, with frequent updates and numerous community extensions highlighted. Links to Huggingface demos and a technical report on arXiv are available.

Licensing & Compatibility

The project's components are derived from other open-source projects, each with its own license. Grounding DINO and SAM are typically permissive (e.g., MIT, Apache 2.0), allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

While highly versatile, the setup involves managing multiple large model checkpoints. Some advanced features, like those involving ChatGPT, require API keys and may incur costs. The project is a research endeavor, and stability for production use may vary.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
532 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.