guided-diffusion by openai

Image synthesis codebase for diffusion models

Created 4 years ago

7,235 stars

Top 7.1% on SourcePulse

View on GitHub

10 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Junyang Lin

Core Maintainer at Alibaba Qwen

Robin Rombach

Cofounder of Black Forest Labs

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

and 6 more!

Project Summary

This repository provides the codebase for guided diffusion models, building upon openai/improved-diffusion with enhancements for classifier conditioning and architectural improvements. It enables users to generate high-fidelity images through diffusion processes, offering class-conditional and unconditional sampling, as well as super-resolution capabilities. The project is primarily aimed at researchers and practitioners in generative AI and computer vision.

How It Works

The project implements diffusion models, a class of generative models that learn to reverse a diffusion process that gradually adds noise to data. This codebase specifically focuses on classifier guidance, where a pre-trained classifier is used during the sampling process to steer the generation towards specific classes, improving sample quality and class adherence. It supports various architectures and noise schedules, including cosine and linear schedules, and incorporates techniques like attention mechanisms and scale-shift normalization for enhanced performance.

Quick Start & Requirements

Install: Clone the repository and install dependencies via pip install -r requirements.txt.
Prerequisites: Python 3.x, PyTorch, NumPy, Pillow, SciPy, and potentially CUDA for GPU acceleration. Pre-trained model checkpoints (e.g., 64x64_diffusion.pt, 256x256_classifier.pt) must be downloaded separately.
Setup: Requires downloading model checkpoints and setting up a models/ directory. Sampling commands are provided for various resolutions and configurations.
Links: Official Paper

Highlighted Details

Achieves state-of-the-art FID scores on ImageNet, outperforming GANs in image synthesis quality.
Supports class-conditional generation, enabling control over the generated image content.
Includes super-resolution models for upscaling lower-resolution generated images.
Provides example scripts for sampling from pre-trained models and training custom classifiers.

Maintenance & Community

This project is maintained by OpenAI. Further community interaction details are not explicitly provided in the README.

Licensing & Compatibility

The repository is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

The codebase is primarily focused on research and may require significant computational resources (GPU, memory) for training and sampling, especially at higher resolutions. The README does not detail specific version requirements for dependencies beyond Python 3.x.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

54 stars in the last 30 days