ctm by sony

PyTorch implementation for a consistency trajectory model research paper

created 1 year ago

295 stars

Top 90.7% on sourcepulse

Project Summary

Consistency Trajectory Models (CTM) provides a PyTorch implementation for a novel diffusion model sampling technique that achieves state-of-the-art results on CIFAR-10 and ImageNet 64x64. It is designed for researchers and practitioners in generative modeling seeking to improve sample fidelity and control over the sampling process.

How It Works

CTM learns the probability flow Ordinary Differential Equation (ODE) trajectory of diffusion models. This approach allows for more diverse sampling options and a better balance between computational cost and sample quality compared to traditional diffusion sampling methods. The model offers flexibility in adjusting the sampling process to suit different computational budgets.

Quick Start & Requirements

Installation: Docker is the recommended installation method. Pull the latest image with docker pull dongjun57/ctm-docker:latest and create a container with GPU support and volume mounts for data and checkpoints.
Prerequisites: Requires PyTorch, TensorFlow (with CUDA), piq==0.7.0, joblib==0.14.0, albumentations==0.4.3, lmdb, CLIP, Pillow, flash-attn, xformers, mpi4py, nvidia-ml-py3, and timm==0.4.12. Access to ILSVRC2012 dataset and pre-trained diffusion models is necessary. Reference statistics for FID, sFID, IS, precision, and recall are also required.
Resources: Training requires significant computational resources and time (10k-50k iterations for CTM+DSM, >=30k for CTM+DSM+GAN). Evaluation requires >=50k samples.

Highlighted Details

Achieves SOTA FID of 1.73 on CIFAR-10 and 1.92 on ImageNet 64x64 for single-step sampling.
Offers diverse sampling options and balances computational budget with sample fidelity.
Official PyTorch implementation of the ICLR'24 paper "Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion".

Maintenance & Community

The project is associated with Sony and academic institutions (Stanford). Contact information for key researchers is provided. No explicit community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. The code is provided as an official implementation, implying potential research-focused usage. Commercial use compatibility is not specified.

Limitations & Caveats

The setup process is complex, heavily relying on Docker and specific versions of numerous dependencies. The project requires substantial datasets (ILSVRC2012) and pre-trained models, along with specific reference statistics for evaluation. Custom dataset integration requires manual code modification.

ctm by sony

Explore Similar Projects

FreeDoM by yujiwen

DiT-MoE by feizc

Phased-Consistency-Model by G-U-N

TeaCache by ali-vilab

DiffPIR by yuanzhi-zhu

efficientdet-pytorch by bubbliiiing

tab-ddpm by yandex-research

v-diffusion-pytorch by crowsonkb

MedSegDiff by SuperMedIntel

consistency_models by openai

improved-diffusion by openai

guided-diffusion by openai