diffusion by mosaicml

Diffusion model training code

Created 2 years ago

712 stars

Top 48.2% on SourcePulse

View on GitHub

3 Experts Love This Project

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Hanlin Tang

CTO Neural Networks at Databricks; Cofounder of MosaicML

Jonathan Frankle

Chief AI Scientist at Databricks

Project Summary

This repository provides code for training custom Stable Diffusion models on user-provided datasets. It is targeted at researchers and developers needing to fine-tune or retrain diffusion models for specific applications, offering a framework for efficient, large-scale training.

How It Works

The project leverages the MosaicML Composer library for distributed training, enabling efficient scaling across multiple GPUs. It supports training Stable Diffusion v1 and v2, as well as SDXL models, with configurations for different resolutions (256x256, 512x512, 1024x1024) and aspect ratio bucketing. The framework allows for pre-computation of VAE and CLIP latents to optimize training throughput.

Quick Start & Requirements

Install: pip install -e . after cloning the repository.
Prerequisites: NVIDIA GPUs, Docker with PyTorch 1.13+ (e.g., mosaicml/pytorch:2.1.2_cu121-python3.10-ubuntu20.04).
Data Prep: Scripts are available for LAION-5B and COCO Captions; custom datasets require a PyTorch DataLoader.
Configuration: Modify YAML files (e.g., SD-2-base-256.yaml) to specify dataset paths and training parameters.
Training: Run via composer run.py --config-path yamls/hydra-yamls --config-name <config_name>.yaml.
Docs: https://github.com/mosaicml/diffusion

Highlighted Details

Benchmarks show training Stable Diffusion 2.0 base on 1.1B images at 256x256 takes ~81 days with 8 A100s, costing ~$31k.
Pre-computing VAE/CLIP latents offline reduces training time/cost by 1.4x.
Supports training SDXL models with aspect ratio bucketing for 1024x1024 resolution.
Includes offline evaluation scripts for FID, KID, CLIP-FID, and CLIP score.

Maintenance & Community

Issues can be filed directly on GitHub.
Direct contact available at demo@mosaicml.com.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify a license, which may impact commercial adoption. The provided cost and time estimates are based on specific hardware (A100 GPUs) and may vary significantly on different configurations.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days