k-diffusion  by crowsonkb

PyTorch implementation of Karras et al. (2022) diffusion models

created 3 years ago
2,492 stars

Top 19.2% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides an implementation of the Karras et al. (2022) diffusion models for PyTorch, targeting researchers and practitioners in generative AI. It offers enhanced sampling algorithms, transformer-based diffusion models, and utilities for training and evaluation, aiming to improve sample quality and training efficiency.

How It Works

The core of k-diffusion is its implementation of diffusion models, including the Karras et al. (2022) paper's techniques. It introduces a novel image_transformer_v2 model type, inspired by Hourglass Transformer and DiT, which utilizes hierarchical transformers. This architecture employs efficient attention mechanisms like neighborhood attention (via NATTEN) and global attention (via FlashAttention-2) at different levels of the hierarchy, allowing for a flexible trade-off between performance and custom CUDA kernel requirements. It also incorporates soft Min-SNR loss weighting for improved high-resolution training.

Quick Start & Requirements

  • Install library code: pip install k-diffusion
  • For training/inference scripts: pip install -e <path to repository>
  • Custom CUDA kernels (NATTEN, FlashAttention-2) are recommended for optimal performance with image_transformer_v2.
  • PyTorch installation should support torch.compile().
  • Training example: python train.py --config configs/config_oxford_flowers_shifted_window.json --name flowers_demo_001 --evaluate-n 0 --batch-size 32 --sample-n 36 --mixed-precision bf16
  • Official docs: Not explicitly linked, but repository structure implies usage via scripts.

Highlighted Details

  • Implements DPM-Solver, DPM-Solver++(2S), and (2M) for improved sampling quality and adaptive step size control.
  • Supports CLIP-guided sampling from unconditional models.
  • Includes wrappers for v-diffusion-pytorch, OpenAI diffusion, and CompVis diffusion models.
  • Enables log likelihood calculation and FID/KID metrics during training.
  • Supports multi-GPU and multi-node training via Hugging Face Accelerate.

Maintenance & Community

  • No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

  • The README does not explicitly state a license.

Limitations & Caveats

The shifted window attention variant of image_transformer_v2 performs worse and is slower than the NATTEN-based version. Models trained with one attention type require fine-tuning to be used with a different type. The inference section is marked as "TODO".

Health Check
Last commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
46 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Travis Fischer Travis Fischer(Founder of Agentic), and
3 more.

consistency_models by openai

0.0%
6k
PyTorch code for consistency models research paper
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Feedback? Help us improve.