USearch by unum-cloud

Similarity search & clustering engine for vectors and arbitrary objects

Created 2 years ago

3,610 stars

Top 13.3% on SourcePulse

View on GitHub

10 Experts Love This Project

Marko Budiselic

Cofounder of MemGraph

Georgios Konstantopoulos

CTO, General Partner at Paradigm

Meng Zhang

Cofounder of TabbyML

Eiso Kant

Cofounder of Poolside AI

and 6 more!

Project Summary

USearch is a high-performance, single-file C++ vector search and clustering engine designed for speed, efficiency, and broad language compatibility. It targets developers and researchers needing to perform similarity searches on large datasets of vectors, offering a lightweight alternative to heavier libraries like FAISS. USearch excels in its minimal dependencies, extensive language bindings, and advanced features like user-defined metrics and memory-efficient indexing.

How It Works

USearch implements the Hierarchical Navigable Small World (HNSW) algorithm for approximate nearest neighbor search, claiming a 10x speed improvement over FAISS. Its core design emphasizes a compact, single-header C++ library, enabling easy integration across various platforms and languages. Key advantages include SIMD optimization, support for half and quarter-precision data types (f16, i8), and the ability to view large indexes from disk without full RAM loading. It also supports user-defined metrics via JIT compilation and offers advanced features like filtering predicates and near-real-time clustering.

Quick Start & Requirements

Install: pip install usearch (Python)
Prerequisites: Python 3, NumPy. C++11 compiler for native builds.
Demo: https://github.com/unum-cloud/usearch/blob/master/examples/usearch-demo.ipynb
Docs: https://github.com/unum-cloud/usearch

Highlighted Details

10x faster HNSW implementation than FAISS.
Single-file C++11 header library with minimal dependencies.
Supports 10+ languages including Python, JavaScript, Rust, Java, C#, GoLang.
Hardware-agnostic f16 & i8 precision support.
User-defined metrics with Numba, Cppyy, or PeachPy.
Memory-efficient indexing with optional 40-bit neighbor references.

Maintenance & Community

The project is actively maintained by Ash Vardanian and Unum Cloud. It has integrations with major platforms like ClickHouse, DuckDB, LangChain, and Microsoft Semantic Kernel. Community channels are available via Discord.

Licensing & Compatibility

USearch is released under the MIT License, allowing for commercial use and integration into closed-source projects.

Limitations & Caveats

While USearch offers broad language support, advanced features like user-defined metrics, batch operations, and filtering predicates are not universally available across all language bindings. Variable-length vectors and 4B+ capacity support are currently limited to the C++ interface.

Health Check

Last Commit

5 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

162 stars in the last 30 days