sd-scripts  by kohya-ss

Training/generation scripts for Stable Diffusion models

created 2 years ago
6,459 stars

Top 8.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive suite of scripts for training and generating images with Stable Diffusion models, targeting researchers and power users. It offers advanced training methods like DreamBooth, LoRA, and Textual Inversion, along with model conversion and image generation capabilities, enabling fine-grained control and customization of diffusion models.

How It Works

The scripts leverage PyTorch and Hugging Face's diffusers library, implementing various optimization techniques for efficient training. Key features include support for LoRA (including LoCon and LoRA+), Orthogonal Finetuning (OFT) with optimized calculations, and memory-saving techniques like fused optimizers and optimizer groups for SDXL training. The implementation prioritizes flexibility, allowing users to configure training parameters via TOML files and command-line arguments.

Quick Start & Requirements

  • Installation: Clone the repository, create a virtual environment, and install dependencies using pip install -r requirements.txt. Specific PyTorch and xformers versions are recommended based on CUDA version (e.g., cu118 or cu121).
  • Prerequisites: Python 3.10.6+ (3.10.x, 3.11.x, 3.12.x tested), Git. CUDA-enabled GPU is essential for training.
  • Setup: Requires manual PyTorch installation matching your CUDA version. accelerate config is used for environment setup.
  • Documentation: Usage documentation is primarily in Japanese, with English translations available.

Highlighted Details

  • Supports DreamBooth, LoRA, Textual Inversion, LoRA+, OFT, and ControlNet training.
  • Includes advanced features like masked loss, scheduled Huber loss, and block-wise learning rates for LoRA.
  • Offers memory optimization techniques for SDXL training (Fused optimizer, Optimizer groups).
  • Supports model conversion between ckpt/safetensors and Diffusers formats.
  • Includes utilities for image tagging and generation.

Maintenance & Community

The project is actively maintained with frequent updates and contributions from a community of developers. Recent updates include support for SD3/SD3.5 (in sd3 branch), OFT improvements, and various bug fixes and feature additions. Links to community resources like Discord/Slack are not explicitly provided in the README.

Licensing & Compatibility

The majority of scripts are licensed under Apache License 2.0 (ASL 2.0). Some components have different licenses (MIT, BSD-3-Clause). ASL 2.0 is generally permissive for commercial use and linking with closed-source projects.

Limitations & Caveats

Documentation is predominantly in Japanese, which may pose a barrier for non-Japanese speakers. Some advanced features like full_bf16 may reduce accuracy. The sd3 branch for SD3/SD3.5 support is noted as separate from the main development.

Health Check
Last commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
9
Issues (30d)
13
Star History
384 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
10 more.

open-r1 by huggingface

0.2%
25k
SDK for reproducing DeepSeek-R1
created 6 months ago
updated 3 days ago
Feedback? Help us improve.