vdvae by openai

Research paper implementation for very deep VAE models

Created 5 years ago

451 stars

Top 66.7% on SourcePulse

View on GitHub

4 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Coauthor of SWE-bench, SWE-agent

Project Summary

This repository provides the implementation for "Very Deep VAEs," a generative model that generalizes autoregressive models and achieves state-of-the-art performance on image generation tasks. It is targeted at researchers and practitioners in deep learning and computer vision looking to explore advanced generative modeling techniques.

How It Works

The VDVAE architecture employs a deep, hierarchical structure with a large number of layers, enabling it to capture complex image distributions. It utilizes a variational autoencoder framework with a novel approach to depth and parameter sharing, allowing for efficient learning of high-dimensional data. This design allows the model to outperform traditional autoregressive models in terms of sample quality and likelihood.

Quick Start & Requirements

Install: Clone the repository and install dependencies including NVIDIA Apex.
Prerequisites: PyTorch 1.6, CUDA 10.1, Numpy 1.16, Ubuntu 18.04, V100 GPUs.
Data: Download datasets using provided setup scripts (setup_cifar10.sh, setup_imagenet.sh, setup_ffhq256.sh, setup_ffhq1024.sh). FFHQ dataset requires manual download of images_1024x1024 subfolder.
Training: Use mpiexec for distributed training (e.g., mpiexec -n 2 python train.py --hps cifar10).
Restoring Models: Download pre-trained checkpoints and use train.py with --restore_path and other restore arguments.
Links: Paper: https://arxiv.org/abs/2011.10650

Highlighted Details

Achieves state-of-the-art performance on image generation benchmarks.
Models range from 39M to 125M parameters.
Training on large datasets like ImageNet and FFHQ requires significant GPU resources (e.g., 32 V100s for 2.5 weeks).
Provides pre-trained checkpoints for CIFAR-10, ImageNet (32x32, 64x64), and FFHQ (256x256, 1024x1024).

Maintenance & Community

Developed by OpenAI.
No explicit community links (Discord/Slack) or roadmap are provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Based on OpenAI's typical practices, it is likely intended for research purposes.

Limitations & Caveats

Requires specific older versions of PyTorch and CUDA, and NVIDIA Apex, which may pose installation challenges.
Training from scratch is computationally intensive, requiring substantial GPU resources and time.
The FFHQ dataset setup requires manual data acquisition.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days