deep-vector-quantization  by karpathy

Training code for VQ-VAEs with categorical latent bottlenecks

Created 4 years ago
585 stars

Top 55.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides training code for Vector Quantized Variational Autoencoders (VQVAEs), enabling the modeling of discrete latent variables for sequence generation tasks. It targets researchers and practitioners in deep learning, offering a foundation for reproducing state-of-the-art generative models like DALL-E.

How It Works

The project implements VQVAEs with categorical latent variable bottlenecks, allowing seamless integration with autoregressive models for discrete sequence modeling. It supports variations like the Gumbel-Softmax trick for differentiable discrete sampling, offering flexibility in latent space representation.

Quick Start & Requirements

Highlighted Details

  • Reproduces DeepMind's VQVAE paper on CIFAR-10.
  • Implements Gumbel-Softmax for differentiable discrete latent variables.
  • Ongoing work towards DALL-E re-implementation, with core components in place.

Maintenance & Community

  • Developed by Andrej Karpathy.
  • Work is ongoing; code requires understanding of the underlying approaches.

Licensing & Compatibility

  • License: Not explicitly stated in the README.

Limitations & Caveats

The DALL-E re-implementation is incomplete, using MSE loss and training on CIFAR-10 with a smaller network. Data-driven initialization for VQVAE is not multi-GPU compatible and may be required to prevent catastrophic index collapse. Gumbel-Softmax training can be finicky and slower, requiring thorough hyperparameter tuning.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.