diffseg by google

Unsupervised zero-shot segmentation method using Stable Diffusion attention

Created 2 years ago

329 stars

Top 83.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

DiffSeg offers an unsupervised, zero-shot image segmentation approach by leveraging attention maps from Stable Diffusion models. It targets researchers and practitioners in computer vision seeking novel segmentation methods without labeled datasets, providing a unique way to segment objects based on textual prompts.

How It Works

DiffSeg utilizes the internal attention mechanisms of a pre-trained Stable Diffusion model to identify salient regions corresponding to text prompts. These attention maps are processed to generate segmentation masks. An experimental feature further enhances this by generating semantic labels for the masks using a BLIP captioning model. This approach avoids the need for explicit segmentation training data, enabling flexible, prompt-driven segmentation.

Quick Start & Requirements

Install: Create a conda environment (conda create --name diffseg python=3.9, conda activate diffseg) and install dependencies (pip install -r requirements.txt).
Prerequisites: Ubuntu 18.04, Python 3.9, TensorFlow 2.14, CUDA 11.x.
Hardware: Recommends 2 GPUs with at least 11GB VRAM each (e.g., RTX 2080Ti) for Stable Diffusion and BLIP models.
Instructions: Detailed usage is available in diffseg.ipynb. Benchmarking instructions are in benchmarks.ipynb.

Highlighted Details

Implements the DiffSeg algorithm as described in the paper "Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion".
Includes an experimental feature for adding semantic labels via BLIP captioning.
Benchmarked on CoCo-Stuff-27 and Cityscapes datasets using the PiCIE evaluation protocol.

Maintenance & Community

Contributors include researchers from Google and Georgia Tech. No specific community channels (Discord/Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

The setup requires specific hardware (2x high-VRAM GPUs) and a particular environment configuration (Ubuntu 18.04, CUDA 11.x, TF 2.14), which may pose adoption challenges. The semantic labeling feature is experimental.

Health Check

Last Commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days