fast-pixel-cnn  by PrajitR

PixelCNN++ speedup via caching for real-time image generation

Created 8 years ago
481 stars

Top 63.7% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a method to significantly accelerate the image generation process of PixelCNN++ models. It targets researchers and practitioners working with autoregressive generative models who need faster inference times, achieving up to a 183x speedup over naive implementations.

How It Works

The core innovation is caching previously computed hidden states within the convolutional layers. Naive PixelCNN++ implementations recompute these states redundantly for each generated pixel. This approach intelligently reuses computations by maintaining a queue-like cache for each layer, proportional to its dilation factor. For layers with strided convolutions (downsampling/upsampling), a cache_every parameter is introduced to manage cache updates, ensuring efficiency without sacrificing accuracy. This caching strategy minimizes redundant computations, leading to substantial speed gains.

Quick Start & Requirements

  • Install TensorFlow 1.0, NumPy, and Matplotlib.
  • Download OpenAI's pre-trained PixelCNN++ model (params_cifar.ckpt).
  • Run: CUDA_VISIBLE_DEVICES=0 python generate.py --checkpoint=/path/to/params_cifar.ckpt --save_dir=/path/to/save/generated/images
  • Requires a Tesla K40 GPU for tested performance.

Highlighted Details

  • Achieves up to 183x speedup for 32x32 image generation.
  • Enables real-time generation of multiple images.
  • Caching mechanism generalizes to dilated and strided convolutions.
  • Applicable to other convolutional autoregressive models like Video Pixel Networks.

Maintenance & Community

The project is authored by Prajit Ramachandran, Tom Le Paine, Pooya Khorrami, and Mohammad Babaeizadeh. A citation is provided for the associated paper: "Fast Generation for Convolutional Autoregressive Models" (arXiv:1704.06001).

Licensing & Compatibility

The repository does not explicitly state a license. The underlying PixelCNN++ model is typically associated with permissive licenses, but this specific implementation's licensing is not detailed.

Limitations & Caveats

The code is tested with Python 3 and TensorFlow 1.0, and may require modifications for other versions. The performance claims are based on a Tesla K40 GPU, and results may vary on different hardware.

Health Check
Last Commit

8 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Feedback? Help us improve.