paint-with-words-sd  by cloneofsimo

Stable Diffusion for text-guided image generation from segmentation maps

created 2 years ago
645 stars

Top 52.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository implements "Paint-with-Words" (PwW), a technique inspired by NVIDIA's eDiff-I, enabling users to control Stable Diffusion image generation using text-labeled segmentation maps. It allows for precise object placement, composition control, and regional seeding, benefiting artists and researchers seeking fine-grained control over AI image synthesis.

How It Works

PwW leverages Stable Diffusion's cross-attention mechanism to interpret segmentation maps. Each color in the map corresponds to a text label with an associated attention strength. During generation, the model adjusts cross-attention scores based on these labels and strengths, effectively "painting" the scene according to the segmentation. The implementation offers customizable weight scaling functions to fine-tune the influence of different regions and allows for regional seeding to control the randomness of specific elements.

Quick Start & Requirements

  • Install via pip: pip install git+https://github.com/cloneofsimo/paint-with-words-sd.git
  • Requires a Hugging Face token for Stable Diffusion, set in a .env file.
  • GPU with CUDA is recommended for performance.
  • Official notebooks and Gradio interfaces are available for demonstration and usage.

Highlighted Details

  • Enables precise control over object placement and composition via segmentation maps.
  • Supports regional seeding for specific elements within the generated image.
  • Offers image inpainting capabilities with segmentation map guidance.
  • Includes a script to convert Stable Diffusion checkpoints to the diffuser format for use with custom models.
  • Provides an extension for AUTOMATIC1111's Stable Diffusion WebUI for integrated PwW control.

Maintenance & Community

The project appears actively developed, with a Gradio interface and an AUTOMATIC1111 extension already implemented. A TODO list indicates planned features like extensive weight function comparisons and negative region support.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial use or integration into closed-source projects.

Limitations & Caveats

The README notes that the project is not compatible with the ControlNet extension in AUTOMATIC1111's WebUI without specific workarounds. Some planned features, such as sentence-wise text separation and negative regions, are still under development.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.