Unsupervised zero-shot segmentation method using Stable Diffusion attention
Top 86.5% on sourcepulse
DiffSeg offers an unsupervised, zero-shot image segmentation approach by leveraging attention maps from Stable Diffusion models. It targets researchers and practitioners in computer vision seeking novel segmentation methods without labeled datasets, providing a unique way to segment objects based on textual prompts.
How It Works
DiffSeg utilizes the internal attention mechanisms of a pre-trained Stable Diffusion model to identify salient regions corresponding to text prompts. These attention maps are processed to generate segmentation masks. An experimental feature further enhances this by generating semantic labels for the masks using a BLIP captioning model. This approach avoids the need for explicit segmentation training data, enabling flexible, prompt-driven segmentation.
Quick Start & Requirements
conda create --name diffseg python=3.9
, conda activate diffseg
) and install dependencies (pip install -r requirements.txt
).diffseg.ipynb
. Benchmarking instructions are in benchmarks.ipynb
.Highlighted Details
Maintenance & Community
Contributors include researchers from Google and Georgia Tech. No specific community channels (Discord/Slack) or roadmap links are provided in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking is therefore undetermined.
Limitations & Caveats
The setup requires specific hardware (2x high-VRAM GPUs) and a particular environment configuration (Ubuntu 18.04, CUDA 11.x, TF 2.14), which may pose adoption challenges. The semantic labeling feature is experimental.
1 year ago
Inactive