hnswlib  by nmslib

Header-only C++ library for fast approximate nearest neighbors

created 8 years ago
4,826 stars

Top 10.5% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides a header-only C++ implementation of the Hierarchical Navigable Small World (HNSW) algorithm for fast approximate nearest neighbor (ANN) search, with Python bindings. It's designed for researchers and engineers working with large-scale vector datasets who need efficient similarity search capabilities, offering advantages in speed, memory footprint, and incremental index construction over other libraries.

How It Works

The library implements the HNSW graph-based indexing structure. It builds a multi-layer graph where each layer is a sparse graph. During search, it starts from the top layer and navigates towards the target vector, progressively refining the search in lower layers. This hierarchical approach allows for rapid pruning of the search space, leading to significantly faster query times compared to brute-force methods. The implementation supports L2, inner product, and cosine similarity metrics.

Quick Start & Requirements

  • Install: pip install hnswlib
  • Prerequisites: C++11 compiler. Python bindings require numpy.
  • Setup: Installation via pip is typically quick. Building from source is also straightforward.
  • Docs: https://github.com/nmslib/hnswlib

Highlighted Details

  • Header-only C++ implementation with minimal dependencies.
  • Supports incremental index construction, updates, and deletions.
  • Python, C++, Java, and R interfaces available.
  • Customizable distance functions in C++.

Maintenance & Community

The project has seen recent activity with version 0.8.0 released, including multi-vector and epsilon search, and bug fixes. Contributions are welcomed, with several contributors listed for recent improvements.

Licensing & Compatibility

The library is released under the MIT License, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

Filtering during search in Python is noted to be slow in multithreaded mode, with a recommendation to use num_threads=1. The ef parameter is not saved with the index and must be reset after loading. Serialization via pickle is not thread-safe with add_items.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
169 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.