semantic-draw  by ironjr

Interactive content creation from image diffusion models

created 1 year ago
576 stars

Top 56.9% on sourcepulse

GitHubView on GitHub
Project Summary

SemanticDraw enables real-time, interactive image generation with fine-grained regional control using diffusion models. It allows users to "paint" with text prompts, assigning specific semantic meanings to different areas of an image, making complex image creation accessible to artists and creators.

How It Works

SemanticDraw integrates region-based control techniques from MultiDiffusion with acceleration methods like LCM and StreamDiffusion. This combination overcomes previous incompatibilities, enabling significantly faster generation times for multi-region, text-to-image tasks. The core innovation lies in efficiently managing and applying multiple text prompts to distinct image regions, drastically reducing latency from hours to seconds.

Quick Start & Requirements

  • Install: conda create -n smd python=3.10 && conda activate smd, git clone https://github.com/ironjr/StreamMultiDiffusion, pip install -r requirements.txt. For SD3 support: pip install git+https://github.com/initml/diffusers.git@clement/feature/flash_sd3.
  • Prerequisites: Python 3.10, CUDA, GPU with sufficient VRAM (e.g., 16GB+ recommended for SDXL).
  • Demos: Hugging Face Spaces available for SD1.5, SDXL, and SD3. Colab notebook also provided.
  • Docs: Detailed usage and architecture explained in the paper appendices.

Highlighted Details

  • Real-time interactive generation with semantic brushes.
  • Supports Stable Diffusion v1.5, SDXL, and Stable Diffusion 3.
  • Achieves 6.3-second generation for 1024x1024 images with SD3.
  • Enables prompt separation to avoid content mixing between regions.
  • Supports image inpainting and panorama generation.

Maintenance & Community

The project is associated with CVPR 2025. Hugging Face Spaces provide interactive demos. The primary contact is jarin.lee@gmail.com.

Licensing & Compatibility

Released under the MIT License, permitting personal and commercial use with citation.

Limitations & Caveats

SDXL-Lightning support is experimental, potentially leading to less prompt obedience or NaN issues with FP16 variants (use dtype=torch.float32 for vanilla SDXL-Lightning). The current GUI uses Gradio's ImageEditor, with potential for improvement by integrating more advanced JavaScript drawing tools.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
16 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.