hnswlib  by nmslib

Header-only C++ library for fast approximate nearest neighbors

Created 8 years ago
4,900 stars

Top 10.2% on SourcePulse

GitHubView on GitHub
Project Summary

This library provides a header-only C++ implementation of the Hierarchical Navigable Small World (HNSW) algorithm for fast approximate nearest neighbor (ANN) search, with Python bindings. It's designed for researchers and engineers working with large-scale vector datasets who need efficient similarity search capabilities, offering advantages in speed, memory footprint, and incremental index construction over other libraries.

How It Works

The library implements the HNSW graph-based indexing structure. It builds a multi-layer graph where each layer is a sparse graph. During search, it starts from the top layer and navigates towards the target vector, progressively refining the search in lower layers. This hierarchical approach allows for rapid pruning of the search space, leading to significantly faster query times compared to brute-force methods. The implementation supports L2, inner product, and cosine similarity metrics.

Quick Start & Requirements

  • Install: pip install hnswlib
  • Prerequisites: C++11 compiler. Python bindings require numpy.
  • Setup: Installation via pip is typically quick. Building from source is also straightforward.
  • Docs: https://github.com/nmslib/hnswlib

Highlighted Details

  • Header-only C++ implementation with minimal dependencies.
  • Supports incremental index construction, updates, and deletions.
  • Python, C++, Java, and R interfaces available.
  • Customizable distance functions in C++.

Maintenance & Community

The project has seen recent activity with version 0.8.0 released, including multi-vector and epsilon search, and bug fixes. Contributions are welcomed, with several contributors listed for recent improvements.

Licensing & Compatibility

The library is released under the MIT License, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

Filtering during search in Python is noted to be slow in multithreaded mode, with a recommendation to use num_threads=1. The ef parameter is not saved with the index and must be reset after loading. Serialization via pickle is not thread-safe with add_items.

Health Check
Last Commit

4 days ago

Responsiveness

1 week

Pull Requests (30d)
3
Issues (30d)
0
Star History
50 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Simon Willison Simon Willison(Coauthor of Django), and
1 more.

faiss_tips by matsui528

0.2%
622
Faiss tips and tricks
Created 7 years ago
Updated 2 weeks ago
Feedback? Help us improve.