Training code for VQ-VAEs with categorical latent bottlenecks
Top 56.5% on sourcepulse
This repository provides training code for Vector Quantized Variational Autoencoders (VQVAEs), enabling the modeling of discrete latent variables for sequence generation tasks. It targets researchers and practitioners in deep learning, offering a foundation for reproducing state-of-the-art generative models like DALL-E.
How It Works
The project implements VQVAEs with categorical latent variable bottlenecks, allowing seamless integration with autoregressive models for discrete sequence modeling. It supports variations like the Gumbel-Softmax trick for differentiable discrete sampling, offering flexibility in latent space representation.
Quick Start & Requirements
cd dvq; python vqvae.py --gpus 1 --data_dir /somewhere/to/store/cifar10
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The DALL-E re-implementation is incomplete, using MSE loss and training on CIFAR-10 with a smaller network. Data-driven initialization for VQVAE is not multi-GPU compatible and may be required to prevent catastrophic index collapse. Gumbel-Softmax training can be finicky and slower, requiring thorough hyperparameter tuning.
3 years ago
1 day