semantic-draw  by ironjr

Interactive content creation from image diffusion models

Created 1 year ago
577 stars

Top 56.1% on SourcePulse

GitHubView on GitHub
Project Summary

SemanticDraw enables real-time, interactive image generation with fine-grained regional control using diffusion models. It allows users to "paint" with text prompts, assigning specific semantic meanings to different areas of an image, making complex image creation accessible to artists and creators.

How It Works

SemanticDraw integrates region-based control techniques from MultiDiffusion with acceleration methods like LCM and StreamDiffusion. This combination overcomes previous incompatibilities, enabling significantly faster generation times for multi-region, text-to-image tasks. The core innovation lies in efficiently managing and applying multiple text prompts to distinct image regions, drastically reducing latency from hours to seconds.

Quick Start & Requirements

  • Install: conda create -n smd python=3.10 && conda activate smd, git clone https://github.com/ironjr/StreamMultiDiffusion, pip install -r requirements.txt. For SD3 support: pip install git+https://github.com/initml/diffusers.git@clement/feature/flash_sd3.
  • Prerequisites: Python 3.10, CUDA, GPU with sufficient VRAM (e.g., 16GB+ recommended for SDXL).
  • Demos: Hugging Face Spaces available for SD1.5, SDXL, and SD3. Colab notebook also provided.
  • Docs: Detailed usage and architecture explained in the paper appendices.

Highlighted Details

  • Real-time interactive generation with semantic brushes.
  • Supports Stable Diffusion v1.5, SDXL, and Stable Diffusion 3.
  • Achieves 6.3-second generation for 1024x1024 images with SD3.
  • Enables prompt separation to avoid content mixing between regions.
  • Supports image inpainting and panorama generation.

Maintenance & Community

The project is associated with CVPR 2025. Hugging Face Spaces provide interactive demos. The primary contact is jarin.lee@gmail.com.

Licensing & Compatibility

Released under the MIT License, permitting personal and commercial use with citation.

Limitations & Caveats

SDXL-Lightning support is experimental, potentially leading to less prompt obedience or NaN issues with FP16 variants (use dtype=torch.float32 for vanilla SDXL-Lightning). The current GUI uses Gradio's ImageEditor, with potential for improvement by integrating more advanced JavaScript drawing tools.

Health Check
Last Commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
3 more.

Hotshot-XL by hotshotco

0%
1k
Text-to-GIF model for Stable Diffusion XL
Created 1 year ago
Updated 1 year ago
Starred by Deepak Pathak Deepak Pathak(Cofounder of Skild AI; Professor at CMU), Travis Fischer Travis Fischer(Founder of Agentic), and
8 more.

sygil-webui by Sygil-Dev

0.0%
8k
Web UI for Stable Diffusion
Created 3 years ago
Updated 2 months ago
Feedback? Help us improve.