sd-scripts by kohya-ss

Training/generation scripts for Stable Diffusion models

Created 3 years ago

6,837 stars

Top 7.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Jiaming Song

Chief Scientist at Luma AI

Thierry Moreau

Principal Engineer at NVIDIA; Cofounder of OctoAI

Project Summary

This repository provides a comprehensive suite of scripts for training and generating images with Stable Diffusion models, targeting researchers and power users. It offers advanced training methods like DreamBooth, LoRA, and Textual Inversion, along with model conversion and image generation capabilities, enabling fine-grained control and customization of diffusion models.

How It Works

The scripts leverage PyTorch and Hugging Face's diffusers library, implementing various optimization techniques for efficient training. Key features include support for LoRA (including LoCon and LoRA+), Orthogonal Finetuning (OFT) with optimized calculations, and memory-saving techniques like fused optimizers and optimizer groups for SDXL training. The implementation prioritizes flexibility, allowing users to configure training parameters via TOML files and command-line arguments.

Quick Start & Requirements

Installation: Clone the repository, create a virtual environment, and install dependencies using pip install -r requirements.txt. Specific PyTorch and xformers versions are recommended based on CUDA version (e.g., cu118 or cu121).
Prerequisites: Python 3.10.6+ (3.10.x, 3.11.x, 3.12.x tested), Git. CUDA-enabled GPU is essential for training.
Setup: Requires manual PyTorch installation matching your CUDA version. accelerate config is used for environment setup.
Documentation: Usage documentation is primarily in Japanese, with English translations available.

Highlighted Details

Supports DreamBooth, LoRA, Textual Inversion, LoRA+, OFT, and ControlNet training.
Includes advanced features like masked loss, scheduled Huber loss, and block-wise learning rates for LoRA.
Offers memory optimization techniques for SDXL training (Fused optimizer, Optimizer groups).
Supports model conversion between ckpt/safetensors and Diffusers formats.
Includes utilities for image tagging and generation.

Maintenance & Community

The project is actively maintained with frequent updates and contributions from a community of developers. Recent updates include support for SD3/SD3.5 (in sd3 branch), OFT improvements, and various bug fixes and feature additions. Links to community resources like Discord/Slack are not explicitly provided in the README.

Licensing & Compatibility

The majority of scripts are licensed under Apache License 2.0 (ASL 2.0). Some components have different licenses (MIT, BSD-3-Clause). ASL 2.0 is generally permissive for commercial use and linking with closed-source projects.

Limitations & Caveats

Documentation is predominantly in Japanese, which may pose a barrier for non-Japanese speakers. Some advanced features like full_bf16 may reduce accuracy. The sd3 branch for SD3/SD3.5 support is noted as separate from the main development.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

57 stars in the last 30 days