Interactive content creation from image diffusion models
Top 56.9% on sourcepulse
SemanticDraw enables real-time, interactive image generation with fine-grained regional control using diffusion models. It allows users to "paint" with text prompts, assigning specific semantic meanings to different areas of an image, making complex image creation accessible to artists and creators.
How It Works
SemanticDraw integrates region-based control techniques from MultiDiffusion with acceleration methods like LCM and StreamDiffusion. This combination overcomes previous incompatibilities, enabling significantly faster generation times for multi-region, text-to-image tasks. The core innovation lies in efficiently managing and applying multiple text prompts to distinct image regions, drastically reducing latency from hours to seconds.
Quick Start & Requirements
conda create -n smd python=3.10 && conda activate smd
, git clone https://github.com/ironjr/StreamMultiDiffusion
, pip install -r requirements.txt
. For SD3 support: pip install git+https://github.com/initml/diffusers.git@clement/feature/flash_sd3
.Highlighted Details
Maintenance & Community
The project is associated with CVPR 2025. Hugging Face Spaces provide interactive demos. The primary contact is jarin.lee@gmail.com.
Licensing & Compatibility
Released under the MIT License, permitting personal and commercial use with citation.
Limitations & Caveats
SDXL-Lightning support is experimental, potentially leading to less prompt obedience or NaN issues with FP16 variants (use dtype=torch.float32
for vanilla SDXL-Lightning). The current GUI uses Gradio's ImageEditor
, with potential for improvement by integrating more advanced JavaScript drawing tools.
2 months ago
1 day