turbovec  by RyanCodrai

Accelerate vector search with data-oblivious quantization

Created 2 weeks ago

New!

268 stars

Top 95.6% on SourcePulse

GitHubView on GitHub
Project Summary

Fast vector quantization in Rust with Python bindings, implementing Google Research's TurboQuant (ICLR 2026). It offers a data-oblivious approach to vector compression, eliminating the need for training codebooks or retraining when data changes, unlike methods such as FAISS PQ. This results in faster index creation, simplified infrastructure, and competitive recall for similarity search applications.

How It Works

TurboQuant compresses vectors by treating them as directions on a hypersphere. The process involves normalizing vectors to unit length, applying a random orthogonal rotation to ensure predictable coordinate distributions (Beta, converging to Gaussian), and then using Lloyd-Max scalar quantization to optimally bucket each coordinate based on this known distribution. Finally, quantized coordinates are bit-packed for significant compression (e.g., 16x for 2-bit). Search is accelerated by rotating the query into the same domain and scoring directly against precomputed codebook values using SIMD intrinsics. This data-oblivious method avoids costly training steps and allows dynamic addition of new vectors.

Quick Start & Requirements

  • Python Installation:
    pip install maturin
    cd turbovec-python
    RUSTFLAGS="-C target-cpu=native" maturin build --release
    pip install target/wheels/*.whl
    
  • Rust Installation:
    cargo build --release
    
  • Prerequisites: Rust toolchain, Python, maturin.
  • Benchmarks: Requires downloading datasets (python3 benchmarks/download_data.py all). Individual benchmark scripts are located in benchmarks/suite/.
  • References: TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate (ICLR 2026).

Highlighted Details

  • Compression: Achieves significant compression, reducing a 1536-dimensional FP32 vector from 6,144 bytes to 384 bytes (16x) at 2-bit width.
  • Recall: At d=3072 2-bit, TurboQuant recall (0.912) exceeds FAISS PQ FastScan (0.903). At d=1536 2-bit, FAISS is slightly ahead (0.882 vs 0.870). Recall discrepancies vary by dimension/bit width and require further investigation.
  • Performance (ARM M3 Max): TurboQuant speed is within 2-25% of FAISS, with ongoing optimization.
  • Performance (x86 Sapphire Rapids): TurboQuant is 1.4-3.7x slower than FAISS, with ongoing optimization efforts.
  • Theoretical Basis: Claims distortion within 2.7x of the information-theoretic lower bound.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmap were found in the provided README.

Licensing & Compatibility

The README does not explicitly state the project's license. This absence is a critical factor for assessing commercial use or integration into closed-source projects.

Limitations & Caveats

Performance on x86 architectures is currently significantly slower than FAISS, though optimization is in progress. Observed recall differences compared to FAISS vary and warrant further investigation. The lack of explicit licensing information presents a potential adoption blocker.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
1
Star History
268 stars in the last 16 days

Explore Similar Projects

Feedback? Help us improve.