k-diffusion by crowsonkb

PyTorch implementation of Karras et al. (2022) diffusion models

Created 3 years ago

2,550 stars

Top 18.2% on SourcePulse

View on GitHub

11 Experts Love This Project

Jiaming Song

Chief Scientist at Luma AI

Chaoyu Yang

Founder of Bento

Luis Capelo

Cofounder of Lightning AI

Jane Manchun Wong

Security Researcher

and 7 more!

Project Summary

This repository provides an implementation of the Karras et al. (2022) diffusion models for PyTorch, targeting researchers and practitioners in generative AI. It offers enhanced sampling algorithms, transformer-based diffusion models, and utilities for training and evaluation, aiming to improve sample quality and training efficiency.

How It Works

The core of k-diffusion is its implementation of diffusion models, including the Karras et al. (2022) paper's techniques. It introduces a novel image_transformer_v2 model type, inspired by Hourglass Transformer and DiT, which utilizes hierarchical transformers. This architecture employs efficient attention mechanisms like neighborhood attention (via NATTEN) and global attention (via FlashAttention-2) at different levels of the hierarchy, allowing for a flexible trade-off between performance and custom CUDA kernel requirements. It also incorporates soft Min-SNR loss weighting for improved high-resolution training.

Quick Start & Requirements

Install library code: pip install k-diffusion
For training/inference scripts: pip install -e <path to repository>
Custom CUDA kernels (NATTEN, FlashAttention-2) are recommended for optimal performance with image_transformer_v2.
PyTorch installation should support torch.compile().
Training example: python train.py --config configs/config_oxford_flowers_shifted_window.json --name flowers_demo_001 --evaluate-n 0 --batch-size 32 --sample-n 36 --mixed-precision bf16
Official docs: Not explicitly linked, but repository structure implies usage via scripts.

Highlighted Details

Implements DPM-Solver, DPM-Solver++(2S), and (2M) for improved sampling quality and adaptive step size control.
Supports CLIP-guided sampling from unconditional models.
Includes wrappers for v-diffusion-pytorch, OpenAI diffusion, and CompVis diffusion models.
Enables log likelihood calculation and FID/KID metrics during training.
Supports multi-GPU and multi-node training via Hugging Face Accelerate.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

The shifted window attention variant of image_transformer_v2 performs worse and is slower than the NATTEN-based version. Models trained with one attention type require fine-tuning to be used with a different type. The inference section is marked as "TODO".

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

13 stars in the last 30 days