ctm  by sony

PyTorch implementation for a consistency trajectory model research paper

created 1 year ago
295 stars

Top 90.7% on sourcepulse

GitHubView on GitHub
Project Summary

Consistency Trajectory Models (CTM) provides a PyTorch implementation for a novel diffusion model sampling technique that achieves state-of-the-art results on CIFAR-10 and ImageNet 64x64. It is designed for researchers and practitioners in generative modeling seeking to improve sample fidelity and control over the sampling process.

How It Works

CTM learns the probability flow Ordinary Differential Equation (ODE) trajectory of diffusion models. This approach allows for more diverse sampling options and a better balance between computational cost and sample quality compared to traditional diffusion sampling methods. The model offers flexibility in adjusting the sampling process to suit different computational budgets.

Quick Start & Requirements

  • Installation: Docker is the recommended installation method. Pull the latest image with docker pull dongjun57/ctm-docker:latest and create a container with GPU support and volume mounts for data and checkpoints.
  • Prerequisites: Requires PyTorch, TensorFlow (with CUDA), piq==0.7.0, joblib==0.14.0, albumentations==0.4.3, lmdb, CLIP, Pillow, flash-attn, xformers, mpi4py, nvidia-ml-py3, and timm==0.4.12. Access to ILSVRC2012 dataset and pre-trained diffusion models is necessary. Reference statistics for FID, sFID, IS, precision, and recall are also required.
  • Resources: Training requires significant computational resources and time (10k-50k iterations for CTM+DSM, >=30k for CTM+DSM+GAN). Evaluation requires >=50k samples.

Highlighted Details

  • Achieves SOTA FID of 1.73 on CIFAR-10 and 1.92 on ImageNet 64x64 for single-step sampling.
  • Offers diverse sampling options and balances computational budget with sample fidelity.
  • Official PyTorch implementation of the ICLR'24 paper "Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion".

Maintenance & Community

The project is associated with Sony and academic institutions (Stanford). Contact information for key researchers is provided. No explicit community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. The code is provided as an official implementation, implying potential research-focused usage. Commercial use compatibility is not specified.

Limitations & Caveats

The setup process is complex, heavily relying on Docker and specific versions of numerous dependencies. The project requires substantial datasets (ILSVRC2012) and pre-trained models, along with specific reference statistics for evaluation. Custom dataset integration requires manual code modification.

Health Check
Last commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Travis Fischer Travis Fischer(Founder of Agentic), and
3 more.

consistency_models by openai

0.0%
6k
PyTorch code for consistency models research paper
created 2 years ago
updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Feedback? Help us improve.