deep-vector-quantization by karpathy

Training code for VQ-VAEs with categorical latent bottlenecks

Created 4 years ago

624 stars

Top 53.0% on SourcePulse

View on GitHub

3 Experts Love This Project

Benjamin Bolte

Cofounder of K-Scale Labs

Andreas Blattmann

Cofounder of Black Forest Labs

Alexander Borzunov

Research Scientist at OpenAI

Project Summary

This repository provides training code for Vector Quantized Variational Autoencoders (VQVAEs), enabling the modeling of discrete latent variables for sequence generation tasks. It targets researchers and practitioners in deep learning, offering a foundation for reproducing state-of-the-art generative models like DALL-E.

How It Works

The project implements VQVAEs with categorical latent variable bottlenecks, allowing seamless integration with autoregressive models for discrete sequence modeling. It supports variations like the Gumbel-Softmax trick for differentiable discrete sampling, offering flexibility in latent space representation.

Quick Start & Requirements

Install and run: cd dvq; python vqvae.py --gpus 1 --data_dir /somewhere/to/store/cifar10
Prerequisites: Python, GPU(s) (tested with 1 GPU), CIFAR-10 dataset.
Links: DeepMind VQVAE Paper, Jang et al. Gumbel Softmax Paper

Highlighted Details

Reproduces DeepMind's VQVAE paper on CIFAR-10.
Implements Gumbel-Softmax for differentiable discrete latent variables.
Ongoing work towards DALL-E re-implementation, with core components in place.

Maintenance & Community

Developed by Andrej Karpathy.
Work is ongoing; code requires understanding of the underlying approaches.

Licensing & Compatibility

License: Not explicitly stated in the README.

Limitations & Caveats

The DALL-E re-implementation is incomplete, using MSE loss and training on CIFAR-10 with a smaller network. Data-driven initialization for VQVAE is not multi-GPU compatible and may be required to prevent catastrophic index collapse. Gumbel-Softmax training can be finicky and slower, requiring thorough hyperparameter tuning.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days