diffusion  by mosaicml

Diffusion model training code

Created 2 years ago
707 stars

Top 48.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code for training custom Stable Diffusion models on user-provided datasets. It is targeted at researchers and developers needing to fine-tune or retrain diffusion models for specific applications, offering a framework for efficient, large-scale training.

How It Works

The project leverages the MosaicML Composer library for distributed training, enabling efficient scaling across multiple GPUs. It supports training Stable Diffusion v1 and v2, as well as SDXL models, with configurations for different resolutions (256x256, 512x512, 1024x1024) and aspect ratio bucketing. The framework allows for pre-computation of VAE and CLIP latents to optimize training throughput.

Quick Start & Requirements

  • Install: pip install -e . after cloning the repository.
  • Prerequisites: NVIDIA GPUs, Docker with PyTorch 1.13+ (e.g., mosaicml/pytorch:2.1.2_cu121-python3.10-ubuntu20.04).
  • Data Prep: Scripts are available for LAION-5B and COCO Captions; custom datasets require a PyTorch DataLoader.
  • Configuration: Modify YAML files (e.g., SD-2-base-256.yaml) to specify dataset paths and training parameters.
  • Training: Run via composer run.py --config-path yamls/hydra-yamls --config-name <config_name>.yaml.
  • Docs: https://github.com/mosaicml/diffusion

Highlighted Details

  • Benchmarks show training Stable Diffusion 2.0 base on 1.1B images at 256x256 takes ~81 days with 8 A100s, costing ~$31k.
  • Pre-computing VAE/CLIP latents offline reduces training time/cost by 1.4x.
  • Supports training SDXL models with aspect ratio bucketing for 1024x1024 resolution.
  • Includes offline evaluation scripts for FID, KID, CLIP-FID, and CLIP score.

Maintenance & Community

  • Issues can be filed directly on GitHub.
  • Direct contact available at demo@mosaicml.com.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify a license, which may impact commercial adoption. The provided cost and time estimates are based on specific hardware (A100 GPUs) and may vary significantly on different configurations.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
5 more.

ai-toolkit by ostris

0.9%
6k
Training toolkit for finetuning diffusion models
Created 2 years ago
Updated 14 hours ago
Starred by Clement Delangue Clement Delangue(Cofounder of Hugging Face), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
26 more.

datasets by huggingface

0.1%
21k
Access and process large AI datasets efficiently
Created 5 years ago
Updated 1 day ago
Feedback? Help us improve.