improved-diffusion by openai

Image diffusion codebase for research

Created 4 years ago

3,770 stars

Top 12.8% on SourcePulse

View on GitHub

3 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Jeremy Howard

Cofounder of fast.ai

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Project Summary

This repository provides the codebase for improved denoising diffusion probabilistic models, enabling researchers and practitioners to train and sample high-quality images. It offers implementations for various diffusion objectives, noise schedules, and conditional generation, with pre-trained models available for ImageNet, CIFAR-10, and LSUN datasets.

How It Works

The project implements diffusion models that learn to reverse a gradual noising process. It supports different noise schedules (linear, cosine) and diffusion objectives (e.g., L_hybrid, L_vlb) to optimize sample quality and training stability. The architecture utilizes U-Net style networks with optional features like learned sigmas, class conditioning, and attention mechanisms for enhanced performance.

Quick Start & Requirements

Install via pip install -e .
Requires Python and PyTorch.
Data preparation involves organizing images into directories; specific scripts are provided for ImageNet, LSUN bedrooms, and CIFAR-10.
Training and sampling scripts accept hyperparameter flags for model architecture, diffusion process, and training configurations.
Official checkpoints and detailed run flags for various configurations are available.

Highlighted Details

Supports class-conditional generation and upsampling models.
Offers implementations for both L_hybrid and L_vlb diffusion objectives.
Provides configurations for linear and cosine noise schedules.
Includes pre-trained checkpoints for ImageNet (64x64), CIFAR-10 (32x32), and LSUN (256x256).

Maintenance & Community

This project is from OpenAI. Specific community channels or active maintenance status are not detailed in the README.

Licensing & Compatibility

The repository is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

Training these models is computationally intensive and requires significant GPU resources, often necessitating distributed training setups (e.g., using MPI). Batch sizes specified in the README are for single-GPU training, and users may need to adjust --batch_size or use --microbatch for memory-constrained environments.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

26 stars in the last 30 days