turbovec by RyanCodrai

Accelerate vector search with data-oblivious quantization

Created 3 months ago

12,643 stars

Top 4.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Piero Molino

Cofounder of Predibase

Project Summary

Fast vector quantization in Rust with Python bindings, implementing Google Research's TurboQuant (ICLR 2026). It offers a data-oblivious approach to vector compression, eliminating the need for training codebooks or retraining when data changes, unlike methods such as FAISS PQ. This results in faster index creation, simplified infrastructure, and competitive recall for similarity search applications.

How It Works

TurboQuant compresses vectors by treating them as directions on a hypersphere. The process involves normalizing vectors to unit length, applying a random orthogonal rotation to ensure predictable coordinate distributions (Beta, converging to Gaussian), and then using Lloyd-Max scalar quantization to optimally bucket each coordinate based on this known distribution. Finally, quantized coordinates are bit-packed for significant compression (e.g., 16x for 2-bit). Search is accelerated by rotating the query into the same domain and scoring directly against precomputed codebook values using SIMD intrinsics. This data-oblivious method avoids costly training steps and allows dynamic addition of new vectors.

Quick Start & Requirements

Python Installation:

pip install maturin
cd turbovec-python
RUSTFLAGS="-C target-cpu=native" maturin build --release
pip install target/wheels/*.whl

Rust Installation:
```
cargo build --release
```
Prerequisites: Rust toolchain, Python, maturin.
Benchmarks: Requires downloading datasets (python3 benchmarks/download_data.py all). Individual benchmark scripts are located in benchmarks/suite/.
References: TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate (ICLR 2026).

Highlighted Details

Compression: Achieves significant compression, reducing a 1536-dimensional FP32 vector from 6,144 bytes to 384 bytes (16x) at 2-bit width.
Recall: At d=3072 2-bit, TurboQuant recall (0.912) exceeds FAISS PQ FastScan (0.903). At d=1536 2-bit, FAISS is slightly ahead (0.882 vs 0.870). Recall discrepancies vary by dimension/bit width and require further investigation.
Performance (ARM M3 Max): TurboQuant speed is within 2-25% of FAISS, with ongoing optimization.
Performance (x86 Sapphire Rapids): TurboQuant is 1.4-3.7x slower than FAISS, with ongoing optimization efforts.
Theoretical Basis: Claims distortion within 2.7x of the information-theoretic lower bound.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmap were found in the provided README.

Licensing & Compatibility

The README does not explicitly state the project's license. This absence is a critical factor for assessing commercial use or integration into closed-source projects.

Limitations & Caveats

Performance on x86 architectures is currently significantly slower than FAISS, though optimization is in progress. Observed recall differences compared to FAISS vary and warrant further investigation. The lack of explicit licensing information presents a potential adoption blocker.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1,619 stars in the last 30 days