vector-quantize-pytorch by lucidrains

PyTorch library for vector quantization techniques

Created 5 years ago

3,813 stars

Top 12.6% on SourcePulse

View on GitHub

3 Experts Love This Project

Jiayi Pan

Author of SWE-Gym; MTS at xAI

Yang Song

Professor at Caltech; Research Scientist at OpenAI

Benjamin Bolte

Cofounder of K-Scale Labs

Project Summary

This library provides PyTorch implementations of various vector and scalar quantization techniques, essential for discrete latent representation learning in generative models. It targets researchers and engineers working on advanced generative AI for images, audio, and speech, offering efficient and flexible building blocks for state-of-the-art models like VQ-VAE-2 and Jukebox.

How It Works

The library implements multiple quantization strategies, including standard Vector Quantization (VQ) with EMA updates, Residual VQ for hierarchical quantization, and Grouped Residual VQ. It incorporates advanced techniques like the rotation trick for gradient propagation, cosine similarity for codebook usage, and methods to prevent codebook collapse (e.g., dead code replacement, lower codebook dimensions). Novel approaches like Random Projection Quantizers, SimVQ, Finite Scalar Quantization (FSQ), Lookup Free Quantization (LFQ), and Latent Quantization are also provided, offering diverse trade-offs between complexity, performance, and codebook utilization.

Quick Start & Requirements

Primary install: pip install vector-quantize-pytorch
Requirements: PyTorch. Specific features might implicitly require CUDA for performance.
Links: Official Documentation (examples within README)

Highlighted Details

Implements multiple VQ variants (VQ, ResidualVQ, GroupedResidualVQ, SimVQ, LFQ, FSQ, LatentQuantize).
Supports advanced training techniques: rotation trick, cosine similarity, k-means initialization, dead code handling.
Includes orthogonal regularization for translation equivariance in image generation.
Offers Random Projection Quantizers for speech modeling, inspired by Google's Universal Speech Model.

Maintenance & Community

The repository is actively maintained by lucidrains, a prolific contributor in the AI research community. It references numerous influential papers, indicating strong ties to current research trends.

Licensing & Compatibility

The library is released under the MIT License, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

While comprehensive, the README does not explicitly detail performance benchmarks across all implemented quantization methods or provide guidance on selecting the optimal method for specific tasks. Some advanced features might require significant computational resources or careful hyperparameter tuning.

Health Check

Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

69 stars in the last 30 days