sd-webui-segment-anything  by continue-revolution

WebUI extension for Stable Diffusion image segmentation tasks

created 2 years ago
3,509 stars

Top 14.1% on sourcepulse

GitHubView on GitHub
Project Summary

This extension integrates the Segment Anything Model (SAM) and GroundingDINO into the AUTOMATIC1111 Stable Diffusion WebUI, enabling advanced image segmentation and manipulation. It targets users seeking to enhance inpainting, semantic segmentation, automated matting, and LoRA/LyCORIS training set creation within Stable Diffusion workflows.

How It Works

The extension leverages SAM for generating masks from point or box prompts, and GroundingDINO for text-guided object detection and segmentation. It seamlessly integrates with the Mikubill ControlNet extension, allowing generated masks to be directly used for ControlNet inpainting and semantic segmentation tasks. This approach automates complex masking processes and provides fine-grained control over image generation.

Quick Start & Requirements

  • Install via git clone into the ${sd-webui}/extensions directory or through the WebUI's extension tab.
  • Requires AUTOMATIC1111 Stable Diffusion WebUI (version 22bcc7be or later) and Mikubill ControlNet Extension.
  • Supports various SAM models (SAM, SAM-HQ, MobileSAM) and GroundingDINO. Models should be placed in ${sd-webui}/models/sam or ${sd-webui-segment-anything}/models/sam.
  • GroundingDINO and ControlNet annotator models are installed automatically on first use.
  • CPU inference for SAM is supported for users without compatible GPUs.
  • Official documentation and demos are available via links in the README.

Highlighted Details

  • Supports SAM, SAM-HQ, and MobileSAM for flexible segmentation quality and performance.
  • Integrates with ControlNet for advanced inpainting and semantic segmentation workflows.
  • Offers text-to-mask generation via GroundingDINO.
  • Includes batch processing for automated matting and training set creation.
  • Provides CPU inference option for SAM on compatible systems.

Maintenance & Community

The project appears to be in a maintenance phase with ongoing issue resolution and monitoring for new research. Community contributions and feature requests are welcomed. Links to demos and usage guides are provided.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility with commercial use or closed-source linking is not specified.

Limitations & Caveats

The extension is noted as not thoroughly tested, with potential bugs. Certain features like color inpainting and explicit mask editing are limited by Gradio/WebUI capabilities. Anime image layout generation performance is poor. GroundingDINO installation can be problematic, though a local installation option is provided to mitigate C++/CUDA compilation issues.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
34 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
1 more.

EditAnything by sail-sg

0.0%
3k
Image editing research paper using segmentation and diffusion
created 2 years ago
updated 5 months ago
Feedback? Help us improve.