vector-quantize-pytorch  by lucidrains

PyTorch library for vector quantization techniques

created 5 years ago
3,444 stars

Top 14.3% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides PyTorch implementations of various vector and scalar quantization techniques, essential for discrete latent representation learning in generative models. It targets researchers and engineers working on advanced generative AI for images, audio, and speech, offering efficient and flexible building blocks for state-of-the-art models like VQ-VAE-2 and Jukebox.

How It Works

The library implements multiple quantization strategies, including standard Vector Quantization (VQ) with EMA updates, Residual VQ for hierarchical quantization, and Grouped Residual VQ. It incorporates advanced techniques like the rotation trick for gradient propagation, cosine similarity for codebook usage, and methods to prevent codebook collapse (e.g., dead code replacement, lower codebook dimensions). Novel approaches like Random Projection Quantizers, SimVQ, Finite Scalar Quantization (FSQ), Lookup Free Quantization (LFQ), and Latent Quantization are also provided, offering diverse trade-offs between complexity, performance, and codebook utilization.

Quick Start & Requirements

  • Primary install: pip install vector-quantize-pytorch
  • Requirements: PyTorch. Specific features might implicitly require CUDA for performance.
  • Links: Official Documentation (examples within README)

Highlighted Details

  • Implements multiple VQ variants (VQ, ResidualVQ, GroupedResidualVQ, SimVQ, LFQ, FSQ, LatentQuantize).
  • Supports advanced training techniques: rotation trick, cosine similarity, k-means initialization, dead code handling.
  • Includes orthogonal regularization for translation equivariance in image generation.
  • Offers Random Projection Quantizers for speech modeling, inspired by Google's Universal Speech Model.

Maintenance & Community

The repository is actively maintained by lucidrains, a prolific contributor in the AI research community. It references numerous influential papers, indicating strong ties to current research trends.

Licensing & Compatibility

The library is released under the MIT License, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

While comprehensive, the README does not explicitly detail performance benchmarks across all implemented quantization methods or provide guidance on selecting the optimal method for specific tasks. Some advanced features might require significant computational resources or careful hyperparameter tuning.

Health Check
Last commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
1
Star History
263 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.