SimSIMD  by ashvardanian

SIMD-optimized library for similarity metrics in Python, Rust, C, JS, and Swift

Created 2 years ago
1,491 stars

Top 27.7% on SourcePulse

GitHubView on GitHub
Project Summary

SimSIMD is a high-performance, mixed-precision math library designed to accelerate vector operations for machine learning, scientific computing, and data retrieval. It offers a broad range of distance and similarity metrics, supporting various data types from float64 down to bit vectors, and is optimized for both x86 (AVX2, AVX-512) and Arm (NEON, SVE, SVE2) architectures.

How It Works

SimSIMD leverages extensive SIMD (Single Instruction, Multiple Data) optimizations to achieve near-memcpy speeds for vector operations. Its core design principle is to maximize hardware utilization, offering significant speedups (up to 200x) over traditional libraries like NumPy and SciPy. It achieves this through techniques such as Horner's method for polynomial approximations, masked loads to eliminate loop tails, and custom Newton-Raphson iterations for reciprocal square roots to improve accuracy and speed. The library also prioritizes mixed-precision computations, using higher-precision accumulators to prevent overflow and maintain accuracy, especially with lower-precision inputs like float16 and bfloat16.

Quick Start & Requirements

  • Install: pip install simsimd
  • Prerequisites: Python 3.x. For full hardware acceleration, compatible CPU architectures (x86 with AVX2/AVX-512, Arm with NEON/SVE/SVE2) are required.
  • Hardware Introspection: python -c "import simsimd; print(simsimd.get_capabilities())"
  • Documentation: python -c "import simsimd; help(simsimd)"
  • Links: Official Docs, Python API, Rust API, C API

Highlighted Details

  • Supports over 350 SIMD-optimized kernels, including Euclidean, Cosine, Hamming, Jaccard, and complex dot products.
  • Handles diverse data types: float64, float32, float16, bfloat16 (real & complex), int8, int4, and binary (b8) vectors.
  • Features zero-dependency, header-only C99 library with bindings for Python, Rust, JavaScript, and Swift.
  • Offers dynamic dispatch for runtime CPU feature detection and optimization.
  • Provides efficient one-to-many, many-to-many, and all-pairs distance calculations.

Maintenance & Community

The project is actively maintained by ashvardanian. Community interaction and contributions are encouraged via GitHub issues and pull requests.

Licensing & Compatibility

Licensed under Apache 2.0 or the Three-clause BSD license, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

Some benchmarks are marked as "in progress" (🚧), indicating ongoing development or potential issues. While the library aims for broad compatibility, specific performance characteristics may vary across different CPU generations and microarchitectures.

Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
2
Star History
41 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Simon Willison Simon Willison(Coauthor of Django), and
1 more.

faiss_tips by matsui528

0.2%
622
Faiss tips and tricks
Created 7 years ago
Updated 2 weeks ago
Starred by Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
1 more.

KernelBench by ScalingIntelligence

1.9%
569
Benchmark for LLMs generating GPU kernels from PyTorch ops
Created 10 months ago
Updated 3 weeks ago
Feedback? Help us improve.