sd-webui-segment-anything by continue-revolution

WebUI extension for Stable Diffusion image segmentation tasks

Created 2 years ago

3,520 stars

Top 13.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Zack Li

Cofounder of Nexa AI

Project Summary

This extension integrates the Segment Anything Model (SAM) and GroundingDINO into the AUTOMATIC1111 Stable Diffusion WebUI, enabling advanced image segmentation and manipulation. It targets users seeking to enhance inpainting, semantic segmentation, automated matting, and LoRA/LyCORIS training set creation within Stable Diffusion workflows.

How It Works

The extension leverages SAM for generating masks from point or box prompts, and GroundingDINO for text-guided object detection and segmentation. It seamlessly integrates with the Mikubill ControlNet extension, allowing generated masks to be directly used for ControlNet inpainting and semantic segmentation tasks. This approach automates complex masking processes and provides fine-grained control over image generation.

Quick Start & Requirements

Install via git clone into the ${sd-webui}/extensions directory or through the WebUI's extension tab.
Requires AUTOMATIC1111 Stable Diffusion WebUI (version 22bcc7be or later) and Mikubill ControlNet Extension.
Supports various SAM models (SAM, SAM-HQ, MobileSAM) and GroundingDINO. Models should be placed in ${sd-webui}/models/sam or ${sd-webui-segment-anything}/models/sam.
GroundingDINO and ControlNet annotator models are installed automatically on first use.
CPU inference for SAM is supported for users without compatible GPUs.
Official documentation and demos are available via links in the README.

Highlighted Details

Supports SAM, SAM-HQ, and MobileSAM for flexible segmentation quality and performance.
Integrates with ControlNet for advanced inpainting and semantic segmentation workflows.
Offers text-to-mask generation via GroundingDINO.
Includes batch processing for automated matting and training set creation.
Provides CPU inference option for SAM on compatible systems.

Maintenance & Community

The project appears to be in a maintenance phase with ongoing issue resolution and monitoring for new research. Community contributions and feature requests are welcomed. Links to demos and usage guides are provided.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility with commercial use or closed-source linking is not specified.

Limitations & Caveats

The extension is noted as not thoroughly tested, with potential bugs. Certain features like color inpainting and explicit mask editing are limited by Gradio/WebUI capabilities. Anime image layout generation performance is poor. GroundingDINO installation can be problematic, though a local installation option is provided to mitigate C++/CUDA compilation issues.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days