nsg  by ZJULearning

ANN search algorithm for metric-free, large-scale vector search

created 7 years ago
684 stars

Top 50.6% on sourcepulse

GitHubView on GitHub
Project Summary

NSG is a C++ library for approximate nearest neighbor search (ANNS) on large-scale, dense real vectors. It offers a flexible and efficient graph-based approach, suitable for researchers and engineers working with high-dimensional data, particularly in e-commerce scenarios. The project has been integrated into Taobao's search engine for billion-scale ANNS.

How It Works

NSG constructs a "Navigating Spread-out Graph" where nodes represent data points and edges represent approximate nearest neighbor relationships. This graph structure allows for efficient traversal during search. The approach is advantageous for its metric-free nature and ability to handle large datasets, outperforming other graph-based ANNS algorithms in terms of index size and search performance in reported benchmarks.

Quick Start & Requirements

  • Install: Clone the repository and build using CMake.
    git clone https://github.com/ZJULearning/nsg.git
    cd nsg/
    mkdir build/ && cd build/
    cmake -DCMAKE_BUILD_TYPE=Release ..
    make -j
    
  • Prerequisites: GCC 4.9+, CMake 2.8+, Boost 1.55+, TCMalloc. Crucially, requires CPU support for AVX-256 instructions (check with cat /proc/cpuinfo | grep avx2). Data must be aligned to 8 or 16 float/int elements using provided data_align() function.
  • Docker: A Dockerfile is provided for building and running the image.
  • Usage: Build an NSG index from a pre-built kNN graph using test_nsg_index, then perform searches with test_nsg_optimized_search or test_nsg_search.
  • Links: PVLDB Paper

Highlighted Details

  • Achieved best search performance among compared algorithms on SIFT1M, GIST1M, RAND4M, and GAUSS5M datasets.
  • Offers the smallest index size among graph-based ANNS algorithms.
  • Demonstrated 1ms average latency for 10,000 queries on 10 million 128-dimensional vectors in a single-thread e-commerce scenario.
  • Supports distributed search across multiple NSG indices.

Maintenance & Community

The project is associated with ZJU Learning and has been integrated into Alibaba's Taobao. The primary contact is via email. A TODO list includes adding Docker support (completed), improving SIMD compatibility, adding a Python wrapper, and integrating Travis CI.

Licensing & Compatibility

NSG is MIT-licensed, permitting commercial use and integration into closed-source projects.

Limitations & Caveats

The current implementation only supports int32 and float32 data types. SIMD-related code compatibility may require further improvement. A Python wrapper is planned but not yet available.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.