sd3.5 by Stability-AI

Reference library for Stable Diffusion 3.5 inference

Created 1 year ago

1,421 stars

Top 28.4% on SourcePulse

Project Summary

This repository provides a reference implementation for Stable Diffusion 3.5 (SD3.5) and SD3, enabling simple inference. It targets developers and researchers needing to integrate SD3.5 capabilities into their applications, offering a foundational code library for text encoders, VAE decoder, and the novel MM-DiT architecture.

How It Works

The implementation leverages a new MM-DiT (Multi-Modal Diffusion Transformer) architecture, a departure from previous diffusion models. It incorporates multiple public text encoders: OpenAI CLIP-L/14, OpenCLIP bigG, and Google T5-XXL. A 16-channel VAE decoder, similar to prior SD models but without a post-quantization convolution step, is also included. This combination aims for efficient and high-quality image generation.

Quick Start & Requirements

Install: Create a virtual environment (python3 -m venv .sd3.5, source .sd3.5/bin/activate) and install dependencies (python3 -m pip install -r requirements.txt).
Models: Download SD3.5 Large/Turbo/Medium, OpenAI CLIP-L, OpenCLIP bigG, and Google T5-XXL weights from HuggingFace into a models directory. ControlNet weights are optional.
Run: Execute inference via python3 sd3_infer.py with specified prompts and model paths. Example: python3 sd3_infer.py --prompt "cute wallpaper art of a cat" --model models/sd3.5_large.safetensors.
Resources: Requires downloading several large model files. Specific hardware requirements (e.g., GPU) are not explicitly stated but are implied for practical inference.
Docs: Refer to the model card for ControlNet preprocessing details.

Highlighted Details

Supports SD3.5 Large, Large-Turbo, and Medium models.
Includes inference code for Blur, Canny, and Depth ControlNets for SD3.5 Large.
Offers options for prompt files and custom output resolution/postfix.
Features a skip_layer_cfg option for SD3.5-Medium for potentially improved structure.

Maintenance & Community

The code originates from Stability AI's internal research, public repositories, and contributions from Alex Goodwin and Vikram Voleti. Some code is adapted from ComfyUI's internal Stability implementation and HuggingFace.

Licensing & Compatibility

The code is licensed under the MIT License. Some code originating from HuggingFace is subject to the Apache2 License. This permissive licensing generally allows for commercial use and integration into closed-source projects.

Limitations & Caveats

This repository is described as a "tiny reference implementation" and excludes model weights. While it supports various SD3.5 variants, it is primarily for inference and may not cover all advanced features or training capabilities.

sd3.5 by Stability-AI

Explore Similar Projects

reverse-engineering-gemma-3n by antimatter15

BK-SDM by Nota-NetsPresso

comfyui_HiDream-Sampler by lum3on

swift-diffusion by liuliu

taesd by madebyollin

kandinsky-5 by kandinskylab

ComfyUI_ExtraModels by city96

consistencydecoder by openai

ComfyUI-nunchaku by nunchaku-tech

vllm-omni by vllm-project

stable-diffusion.cpp by leejet

sdnext by vladmandic