semantic-draw by ironjr

Interactive content creation from image diffusion models

Created 1 year ago

583 stars

Top 55.6% on SourcePulse

Project Summary

SemanticDraw enables real-time, interactive image generation with fine-grained regional control using diffusion models. It allows users to "paint" with text prompts, assigning specific semantic meanings to different areas of an image, making complex image creation accessible to artists and creators.

How It Works

SemanticDraw integrates region-based control techniques from MultiDiffusion with acceleration methods like LCM and StreamDiffusion. This combination overcomes previous incompatibilities, enabling significantly faster generation times for multi-region, text-to-image tasks. The core innovation lies in efficiently managing and applying multiple text prompts to distinct image regions, drastically reducing latency from hours to seconds.

Quick Start & Requirements

Install: conda create -n smd python=3.10 && conda activate smd, git clone https://github.com/ironjr/StreamMultiDiffusion, pip install -r requirements.txt. For SD3 support: pip install git+https://github.com/initml/diffusers.git@clement/feature/flash_sd3.
Prerequisites: Python 3.10, CUDA, GPU with sufficient VRAM (e.g., 16GB+ recommended for SDXL).
Demos: Hugging Face Spaces available for SD1.5, SDXL, and SD3. Colab notebook also provided.
Docs: Detailed usage and architecture explained in the paper appendices.

Highlighted Details

Real-time interactive generation with semantic brushes.
Supports Stable Diffusion v1.5, SDXL, and Stable Diffusion 3.
Achieves 6.3-second generation for 1024x1024 images with SD3.
Enables prompt separation to avoid content mixing between regions.
Supports image inpainting and panorama generation.

Maintenance & Community

The project is associated with CVPR 2025. Hugging Face Spaces provide interactive demos. The primary contact is jarin.lee@gmail.com.

Licensing & Compatibility

Released under the MIT License, permitting personal and commercial use with citation.

Limitations & Caveats

SDXL-Lightning support is experimental, potentially leading to less prompt obedience or NaN issues with FP16 variants (use dtype=torch.float32 for vanilla SDXL-Lightning). The current GUI uses Gradio's ImageEditor, with potential for improvement by integrating more advanced JavaScript drawing tools.

semantic-draw by ironjr

Explore Similar Projects

OneReward by bytedance

peacasso by victordibia

mixture-of-diffusers by albarji

kandinsky-5 by kandinskylab

clip-guided-diffusion by afiaka87

MultiDiffusion by omerbt

infinite-zoom-automatic1111-webui by v8hid

stable-diffusion-2-gui by qunash

RPG-DiffusionMaster by YangLing0818

imaginAIry by brycedrennan

sygil-webui by Sygil-Dev

latent-diffusion by CompVis