diffusion  by mosaicml

Diffusion model training code

created 2 years ago
703 stars

Top 49.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code for training custom Stable Diffusion models on user-provided datasets. It is targeted at researchers and developers needing to fine-tune or retrain diffusion models for specific applications, offering a framework for efficient, large-scale training.

How It Works

The project leverages the MosaicML Composer library for distributed training, enabling efficient scaling across multiple GPUs. It supports training Stable Diffusion v1 and v2, as well as SDXL models, with configurations for different resolutions (256x256, 512x512, 1024x1024) and aspect ratio bucketing. The framework allows for pre-computation of VAE and CLIP latents to optimize training throughput.

Quick Start & Requirements

  • Install: pip install -e . after cloning the repository.
  • Prerequisites: NVIDIA GPUs, Docker with PyTorch 1.13+ (e.g., mosaicml/pytorch:2.1.2_cu121-python3.10-ubuntu20.04).
  • Data Prep: Scripts are available for LAION-5B and COCO Captions; custom datasets require a PyTorch DataLoader.
  • Configuration: Modify YAML files (e.g., SD-2-base-256.yaml) to specify dataset paths and training parameters.
  • Training: Run via composer run.py --config-path yamls/hydra-yamls --config-name <config_name>.yaml.
  • Docs: https://github.com/mosaicml/diffusion

Highlighted Details

  • Benchmarks show training Stable Diffusion 2.0 base on 1.1B images at 256x256 takes ~81 days with 8 A100s, costing ~$31k.
  • Pre-computing VAE/CLIP latents offline reduces training time/cost by 1.4x.
  • Supports training SDXL models with aspect ratio bucketing for 1024x1024 resolution.
  • Includes offline evaluation scripts for FID, KID, CLIP-FID, and CLIP score.

Maintenance & Community

  • Issues can be filed directly on GitHub.
  • Direct contact available at demo@mosaicml.com.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README does not specify a license, which may impact commercial adoption. The provided cost and time estimates are based on specific hardware (A100 GPUs) and may vary significantly on different configurations.

Health Check
Last commit

6 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.