Rust library for fast vector similarity computations
Top 73.8% on sourcepulse
This library provides efficient computation of various similarity measures between vectors, targeting data scientists, machine learning engineers, and researchers. It offers optimized Rust implementations with Python bindings for seamless integration, enabling faster and more robust analysis of vector relationships, particularly with high-dimensional data from LLMs.
How It Works
The library leverages Rust's performance capabilities, utilizing the rayon
crate for parallel processing and ndarray
for vectorized operations. It implements several similarity measures including Spearman's Rho, Kendall's Tau (optimized with merge sort for inversion counting), Approximate Distance Correlation, Jensen-Shannon Dependency Measure, Hoeffding's D, and Normalized Mutual Information. A key feature is its bootstrapping functionality for robust estimation and confidence intervals.
Quick Start & Requirements
pip install fast_vector_similarity
Cargo.toml
.Highlighted Details
rayon
and vectorized operations via ndarray
for performance.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README does not specify a license, which is a significant blocker for determining commercial use or integration compatibility. Some measures like Jensen-Shannon Dependency Measure have been "revised for improved utility," implying potential breaking changes or shifts in interpretation.
5 months ago
1 day