Header-only C++ library for fast approximate nearest neighbors
Top 10.5% on sourcepulse
This library provides a header-only C++ implementation of the Hierarchical Navigable Small World (HNSW) algorithm for fast approximate nearest neighbor (ANN) search, with Python bindings. It's designed for researchers and engineers working with large-scale vector datasets who need efficient similarity search capabilities, offering advantages in speed, memory footprint, and incremental index construction over other libraries.
How It Works
The library implements the HNSW graph-based indexing structure. It builds a multi-layer graph where each layer is a sparse graph. During search, it starts from the top layer and navigates towards the target vector, progressively refining the search in lower layers. This hierarchical approach allows for rapid pruning of the search space, leading to significantly faster query times compared to brute-force methods. The implementation supports L2, inner product, and cosine similarity metrics.
Quick Start & Requirements
pip install hnswlib
numpy
.Highlighted Details
Maintenance & Community
The project has seen recent activity with version 0.8.0 released, including multi-vector and epsilon search, and bug fixes. Contributions are welcomed, with several contributors listed for recent improvements.
Licensing & Compatibility
The library is released under the MIT License, permitting commercial use and integration into closed-source projects.
Limitations & Caveats
Filtering during search in Python is noted to be slow in multithreaded mode, with a recommendation to use num_threads=1
. The ef
parameter is not saved with the index and must be reset after loading. Serialization via pickle
is not thread-safe with add_items
.
1 month ago
Inactive