Discover and explore top open-source AI tools and projects—updated daily.
jorgebmannGPU-accelerated online vector quantization for RAG and ANN
Top 72.1% on SourcePulse
This library provides a Python implementation of Google's TurboQuant framework, enabling data-oblivious online vector quantization for embedding storage and approximate nearest neighbor (ANN) search. It offers significant memory compression for RAG pipelines and large-scale retrieval without requiring codebook training or multiple data passes, making it ideal for on-premise RAG and resource-constrained environments.
How It Works
The core approach involves applying a random orthogonal rotation to vectors, followed by per-coordinate scalar quantization using precomputed Lloyd-Max codebooks (MSE-Optimal Quantizer). For inner product estimation, a 1-bit Quantized Johnson-Lindenstrauss (QJL) transform is applied to the quantization residual, yielding an unbiased estimator. This method allows vectors to be quantized independently, facilitating truly online ingestion and eliminating the need for costly indexing or retraining steps.
Quick Start & Requirements
pip install pyturboquantpip install pyturboquant[langchain]pip install pyturboquant[dev]pip install pyturboquant[all]Highlighted Details
TurboQuantIndex with a FAISS-like API.search_batch_size.TurboQuantVectorStore for low-RAM RAG pipelines.Maintenance & Community
The project is marked as Work In Progress (WIP) with a roadmap indicating future enhancements like LlamaIndex integration and sub-linear search. No specific community channels (e.g., Discord, Slack) or notable contributors/sponsorships are detailed in the provided text.
Licensing & Compatibility
The project is released under the MIT License, which is permissive and generally compatible with commercial use and closed-source linking.
Limitations & Caveats
This is a Work In Progress (WIP) implementation. The current search compute complexity is O(n) per query, with sub-linear search planned for v0.5.0. The library focuses on compressing embedding vectors, not the embedding models themselves, meaning VRAM requirements for running models remain unchanged.
4 days ago
Inactive
tensorchord
Vahe1994
milvus-io