paint-with-words-sd by cloneofsimo

Stable Diffusion for text-guided image generation from segmentation maps

Created 3 years ago

646 stars

Top 51.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Andreas Jansson

Cofounder of Replicate

Christian Laforte

Distinguished Engineer at NVIDIA; Former CTO at Stability AI

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

This repository implements "Paint-with-Words" (PwW), a technique inspired by NVIDIA's eDiff-I, enabling users to control Stable Diffusion image generation using text-labeled segmentation maps. It allows for precise object placement, composition control, and regional seeding, benefiting artists and researchers seeking fine-grained control over AI image synthesis.

How It Works

PwW leverages Stable Diffusion's cross-attention mechanism to interpret segmentation maps. Each color in the map corresponds to a text label with an associated attention strength. During generation, the model adjusts cross-attention scores based on these labels and strengths, effectively "painting" the scene according to the segmentation. The implementation offers customizable weight scaling functions to fine-tune the influence of different regions and allows for regional seeding to control the randomness of specific elements.

Quick Start & Requirements

Install via pip: pip install git+https://github.com/cloneofsimo/paint-with-words-sd.git
Requires a Hugging Face token for Stable Diffusion, set in a .env file.
GPU with CUDA is recommended for performance.
Official notebooks and Gradio interfaces are available for demonstration and usage.

Highlighted Details

Enables precise control over object placement and composition via segmentation maps.
Supports regional seeding for specific elements within the generated image.
Offers image inpainting capabilities with segmentation map guidance.
Includes a script to convert Stable Diffusion checkpoints to the diffuser format for use with custom models.
Provides an extension for AUTOMATIC1111's Stable Diffusion WebUI for integrated PwW control.

Maintenance & Community

The project appears actively developed, with a Gradio interface and an AUTOMATIC1111 extension already implemented. A TODO list indicates planned features like extensive weight function comparisons and negative region support.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The README notes that the project is not compatible with the ControlNet extension in AUTOMATIC1111's WebUI without specific workarounds. Some planned features, such as sentence-wise text separation and negative regions, are still under development.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days