ai-toolkit by ostris

Training toolkit for finetuning diffusion models

Created 2 years ago

8,824 stars

Top 5.8% on SourcePulse

View on GitHub

7 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Jeff Hammerbacher

Cofounder of Cloudera

Luis Capelo

Cofounder of Lightning AI

Jiaming Song

Chief Scientist at Luma AI

and 3 more!

Project Summary

This toolkit provides a comprehensive suite for fine-tuning diffusion models, targeting users who want to train image and video models on consumer-grade hardware. It offers both a GUI and CLI, aiming for ease of use with extensive features for model training.

How It Works

The toolkit leverages PyTorch for its core operations and supports various training techniques like LoRA and LoKr. It includes features for dataset preparation, allowing automatic resizing and aspect ratio handling, and enables fine-grained control over which model layers are trained using only_if_contains and ignore_if_contains network arguments.

Quick Start & Requirements

Installation: Clone the repository, create a Python virtual environment, install PyTorch (cu126), and then install requirements.
Prerequisites: Python >3.10, Nvidia GPU with sufficient VRAM, Node.js >18 (for UI).
UI: Run npm run build_and_start in the ui directory. Access at http://localhost:8675.
Auth Token: Set AI_TOOLKIT_AUTH environment variable to secure the UI.
Documentation: Tutorials and examples are available for FLUX.1 training, RunPod, and Modal.

Highlighted Details

Supports training on consumer-grade hardware, with specific tutorials for 24GB VRAM GPUs.
Offers both a web-based UI and a CLI for flexible interaction.
Includes advanced features like training specific layers and supporting LoKr network type.
Provides examples for deployment on platforms like RunPod and Modal.

Maintenance & Community

The project is actively maintained, with the last update on 2025-04-22.
Support and community interaction are primarily directed to a Discord server.

Licensing & Compatibility

The base toolkit appears to be permissively licensed, but specific models like FLUX.1-dev have a non-commercial license, which is inherited by trained models. FLUX.1-schnell is Apache 2.0 licensed.
Commercial use is possible with Apache 2.0 licensed models, but requires careful attention to the specific model's license.

Limitations & Caveats

Training FLUX.1 requires a minimum of 24GB VRAM, and native Windows support has reported bugs.
FLUX.1-dev has a non-commercial license and requires Hugging Face authentication and license acceptance.
WebP image format has known issues.

Health Check

Last Commit

1 week ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

759 stars in the last 30 days