SimpleTuner by bghira

Fine-tuning kit for diffusion models

Created 2 years ago

2,705 stars

Top 17.4% on SourcePulse

View on GitHub

4 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Sebastian Raschka

Author of "Build a Large Language Model (From Scratch)"

Omar Sanseviero

DevRel at Google DeepMind

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

SimpleTuner is a versatile, academic-focused fine-tuning toolkit for diffusion models, designed for simplicity and ease of understanding. It supports a wide array of diffusion models, including Stable Diffusion variants, PixArt, HiDream, and video models like Wan 2.1 and LTX, catering to researchers and power users needing flexible training capabilities.

How It Works

SimpleTuner prioritizes simplicity with sensible defaults and incorporates only proven, cutting-edge features. It employs aspect bucketing for varied data sizes and includes advanced techniques like quantised LoRA/LyCORIS training (NF4/INT8/FP8), EMA weights for stability, and DeepSpeed integration for memory efficiency, enabling full U-Net training on as little as 12GB VRAM.

Quick Start & Requirements

Install: pip install . (from cloned repo)
Prerequisites: Python 3.10+, PyTorch, Hugging Face libraries. NVIDIA GPUs recommended; AMD and Apple Silicon (M-series) are supported with caveats. CUDA 11.8+ or ROCm 5.6+ for AMD.
Resources: Training SDXL on 12GB VRAM is possible with DeepSpeed but slow. 16GB+ VRAM is recommended for most models, with 24GB+ ideal for higher resolutions and full U-Net training.
Docs: Tutorial, Quick Start, DeepSpeed, Toolkit

Highlighted Details

Supports LoRA/LyCORIS, full U-Net, and ControlNet training for various models.
Features like masked loss, MoE, and prior regularization enhance training quality.
Direct S3-compatible storage integration (Cloudflare R2, Wasabi) for scalable data handling.
Webhook support for training progress notifications (e.g., Discord).

Maintenance & Community

Community support via Discord (Terminus Research Group).
Contributions are welcome.

Licensing & Compatibility

License: MIT.
Compatible with commercial use and closed-source linking.

Limitations & Caveats

Scripts have the potential to damage training data; backups are essential.
Some models have limited support (e.g., text encoder or ControlNet training not supported for Wan Video, PixArt Sigma, NVLabs Sana).
Apple Silicon (MPS) support may encounter random bugs.

Health Check

Last Commit

11 hours ago

Responsiveness

1 day

Pull Requests (30d)

201

Issues (30d)

Star History

69 stars in the last 30 days