diffseg  by google

Unsupervised zero-shot segmentation method using Stable Diffusion attention

created 1 year ago
317 stars

Top 86.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

DiffSeg offers an unsupervised, zero-shot image segmentation approach by leveraging attention maps from Stable Diffusion models. It targets researchers and practitioners in computer vision seeking novel segmentation methods without labeled datasets, providing a unique way to segment objects based on textual prompts.

How It Works

DiffSeg utilizes the internal attention mechanisms of a pre-trained Stable Diffusion model to identify salient regions corresponding to text prompts. These attention maps are processed to generate segmentation masks. An experimental feature further enhances this by generating semantic labels for the masks using a BLIP captioning model. This approach avoids the need for explicit segmentation training data, enabling flexible, prompt-driven segmentation.

Quick Start & Requirements

  • Install: Create a conda environment (conda create --name diffseg python=3.9, conda activate diffseg) and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Ubuntu 18.04, Python 3.9, TensorFlow 2.14, CUDA 11.x.
  • Hardware: Recommends 2 GPUs with at least 11GB VRAM each (e.g., RTX 2080Ti) for Stable Diffusion and BLIP models.
  • Instructions: Detailed usage is available in diffseg.ipynb. Benchmarking instructions are in benchmarks.ipynb.

Highlighted Details

  • Implements the DiffSeg algorithm as described in the paper "Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion".
  • Includes an experimental feature for adding semantic labels via BLIP captioning.
  • Benchmarked on CoCo-Stuff-27 and Cityscapes datasets using the PiCIE evaluation protocol.

Maintenance & Community

Contributors include researchers from Google and Georgia Tech. No specific community channels (Discord/Slack) or roadmap links are provided in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking is therefore undetermined.

Limitations & Caveats

The setup requires specific hardware (2x high-VRAM GPUs) and a particular environment configuration (Ubuntu 18.04, CUDA 11.x, TF 2.14), which may pose adoption challenges. The semantic labeling feature is experimental.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.