ANN search algorithm for metric-free, large-scale vector search
Top 50.6% on sourcepulse
NSG is a C++ library for approximate nearest neighbor search (ANNS) on large-scale, dense real vectors. It offers a flexible and efficient graph-based approach, suitable for researchers and engineers working with high-dimensional data, particularly in e-commerce scenarios. The project has been integrated into Taobao's search engine for billion-scale ANNS.
How It Works
NSG constructs a "Navigating Spread-out Graph" where nodes represent data points and edges represent approximate nearest neighbor relationships. This graph structure allows for efficient traversal during search. The approach is advantageous for its metric-free nature and ability to handle large datasets, outperforming other graph-based ANNS algorithms in terms of index size and search performance in reported benchmarks.
Quick Start & Requirements
git clone https://github.com/ZJULearning/nsg.git
cd nsg/
mkdir build/ && cd build/
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j
cat /proc/cpuinfo | grep avx2
). Data must be aligned to 8 or 16 float/int elements using provided data_align()
function.test_nsg_index
, then perform searches with test_nsg_optimized_search
or test_nsg_search
.Highlighted Details
Maintenance & Community
The project is associated with ZJU Learning and has been integrated into Alibaba's Taobao. The primary contact is via email. A TODO list includes adding Docker support (completed), improving SIMD compatibility, adding a Python wrapper, and integrating Travis CI.
Licensing & Compatibility
NSG is MIT-licensed, permitting commercial use and integration into closed-source projects.
Limitations & Caveats
The current implementation only supports int32
and float32
data types. SIMD-related code compatibility may require further improvement. A Python wrapper is planned but not yet available.
1 year ago
1 week