Stable Diffusion for text-guided image generation from segmentation maps
Top 52.6% on sourcepulse
This repository implements "Paint-with-Words" (PwW), a technique inspired by NVIDIA's eDiff-I, enabling users to control Stable Diffusion image generation using text-labeled segmentation maps. It allows for precise object placement, composition control, and regional seeding, benefiting artists and researchers seeking fine-grained control over AI image synthesis.
How It Works
PwW leverages Stable Diffusion's cross-attention mechanism to interpret segmentation maps. Each color in the map corresponds to a text label with an associated attention strength. During generation, the model adjusts cross-attention scores based on these labels and strengths, effectively "painting" the scene according to the segmentation. The implementation offers customizable weight scaling functions to fine-tune the influence of different regions and allows for regional seeding to control the randomness of specific elements.
Quick Start & Requirements
pip install git+https://github.com/cloneofsimo/paint-with-words-sd.git
.env
file.Highlighted Details
Maintenance & Community
The project appears actively developed, with a Gradio interface and an AUTOMATIC1111 extension already implemented. A TODO list indicates planned features like extensive weight function comparisons and negative region support.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Users should verify licensing for commercial use or integration into closed-source projects.
Limitations & Caveats
The README notes that the project is not compatible with the ControlNet extension in AUTOMATIC1111's WebUI without specific workarounds. Some planned features, such as sentence-wise text separation and negative regions, are still under development.
2 years ago
Inactive