fast_vector_similarity  by Dicklesworthstone

Rust library for fast vector similarity computations

created 1 year ago
397 stars

Top 73.8% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides efficient computation of various similarity measures between vectors, targeting data scientists, machine learning engineers, and researchers. It offers optimized Rust implementations with Python bindings for seamless integration, enabling faster and more robust analysis of vector relationships, particularly with high-dimensional data from LLMs.

How It Works

The library leverages Rust's performance capabilities, utilizing the rayon crate for parallel processing and ndarray for vectorized operations. It implements several similarity measures including Spearman's Rho, Kendall's Tau (optimized with merge sort for inversion counting), Approximate Distance Correlation, Jensen-Shannon Dependency Measure, Hoeffding's D, and Normalized Mutual Information. A key feature is its bootstrapping functionality for robust estimation and confidence intervals.

Quick Start & Requirements

Highlighted Details

  • Implements 6 distinct similarity measures: Spearman's Rho, Kendall's Tau, Approximate Distance Correlation, Jensen-Shannon Dependency Measure, Hoeffding's D, and Normalized Mutual Information.
  • Features parallel processing via rayon and vectorized operations via ndarray for performance.
  • Includes bootstrapping for robust estimation and confidence intervals.
  • Compatible with high-dimensional vectors (e.g., 4096-dim) from LLMs like Llama2.

Maintenance & Community

  • The project is maintained by Dicklesworthstone. No other contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. This is a critical omission for adoption.

Limitations & Caveats

The README does not specify a license, which is a significant blocker for determining commercial use or integration compatibility. Some measures like Jensen-Shannon Dependency Measure have been "revised for improved utility," implying potential breaking changes or shifts in interpretation.

Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.