usearch  by unum-cloud

Similarity search & clustering engine for vectors and arbitrary objects

created 2 years ago
3,014 stars

Top 16.2% on sourcepulse

GitHubView on GitHub
Project Summary

USearch is a high-performance, single-file C++ vector search and clustering engine designed for speed, efficiency, and broad language compatibility. It targets developers and researchers needing to perform similarity searches on large datasets of vectors, offering a lightweight alternative to heavier libraries like FAISS. USearch excels in its minimal dependencies, extensive language bindings, and advanced features like user-defined metrics and memory-efficient indexing.

How It Works

USearch implements the Hierarchical Navigable Small World (HNSW) algorithm for approximate nearest neighbor search, claiming a 10x speed improvement over FAISS. Its core design emphasizes a compact, single-header C++ library, enabling easy integration across various platforms and languages. Key advantages include SIMD optimization, support for half and quarter-precision data types (f16, i8), and the ability to view large indexes from disk without full RAM loading. It also supports user-defined metrics via JIT compilation and offers advanced features like filtering predicates and near-real-time clustering.

Quick Start & Requirements

Highlighted Details

  • 10x faster HNSW implementation than FAISS.
  • Single-file C++11 header library with minimal dependencies.
  • Supports 10+ languages including Python, JavaScript, Rust, Java, C#, GoLang.
  • Hardware-agnostic f16 & i8 precision support.
  • User-defined metrics with Numba, Cppyy, or PeachPy.
  • Memory-efficient indexing with optional 40-bit neighbor references.

Maintenance & Community

The project is actively maintained by Ash Vardanian and Unum Cloud. It has integrations with major platforms like ClickHouse, DuckDB, LangChain, and Microsoft Semantic Kernel. Community channels are available via Discord.

Licensing & Compatibility

USearch is released under the MIT License, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

While USearch offers broad language support, advanced features like user-defined metrics, batch operations, and filtering predicates are not universally available across all language bindings. Variable-length vectors and 4B+ capacity support are currently limited to the C++ interface.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
10
Issues (30d)
14
Star History
343 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.