sd3.5  by Stability-AI

Reference library for Stable Diffusion 3.5 inference

created 9 months ago
1,256 stars

Top 32.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a reference implementation for Stable Diffusion 3.5 (SD3.5) and SD3, enabling simple inference. It targets developers and researchers needing to integrate SD3.5 capabilities into their applications, offering a foundational code library for text encoders, VAE decoder, and the novel MM-DiT architecture.

How It Works

The implementation leverages a new MM-DiT (Multi-Modal Diffusion Transformer) architecture, a departure from previous diffusion models. It incorporates multiple public text encoders: OpenAI CLIP-L/14, OpenCLIP bigG, and Google T5-XXL. A 16-channel VAE decoder, similar to prior SD models but without a post-quantization convolution step, is also included. This combination aims for efficient and high-quality image generation.

Quick Start & Requirements

  • Install: Create a virtual environment (python3 -m venv .sd3.5, source .sd3.5/bin/activate) and install dependencies (python3 -m pip install -r requirements.txt).
  • Models: Download SD3.5 Large/Turbo/Medium, OpenAI CLIP-L, OpenCLIP bigG, and Google T5-XXL weights from HuggingFace into a models directory. ControlNet weights are optional.
  • Run: Execute inference via python3 sd3_infer.py with specified prompts and model paths. Example: python3 sd3_infer.py --prompt "cute wallpaper art of a cat" --model models/sd3.5_large.safetensors.
  • Resources: Requires downloading several large model files. Specific hardware requirements (e.g., GPU) are not explicitly stated but are implied for practical inference.
  • Docs: Refer to the model card for ControlNet preprocessing details.

Highlighted Details

  • Supports SD3.5 Large, Large-Turbo, and Medium models.
  • Includes inference code for Blur, Canny, and Depth ControlNets for SD3.5 Large.
  • Offers options for prompt files and custom output resolution/postfix.
  • Features a skip_layer_cfg option for SD3.5-Medium for potentially improved structure.

Maintenance & Community

The code originates from Stability AI's internal research, public repositories, and contributions from Alex Goodwin and Vikram Voleti. Some code is adapted from ComfyUI's internal Stability implementation and HuggingFace.

Licensing & Compatibility

The code is licensed under the MIT License. Some code originating from HuggingFace is subject to the Apache2 License. This permissive licensing generally allows for commercial use and integration into closed-source projects.

Limitations & Caveats

This repository is described as a "tiny reference implementation" and excludes model weights. While it supports various SD3.5 variants, it is primarily for inference and may not cover all advanced features or training capabilities.

Health Check
Last commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
112 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify) and Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers).

taesd by madebyollin

0.5%
758
Tiny AutoEncoder for Stable Diffusion latents
created 2 years ago
updated 3 months ago
Feedback? Help us improve.