ANN benchmarks for approximate nearest neighbor search algorithms
Top 9.5% on sourcepulse
This project provides a comprehensive benchmarking framework for approximate nearest neighbor (ANN) search libraries, targeting researchers and engineers working with high-dimensional data. It offers standardized datasets, Dockerized environments for each algorithm, and tools for reproducible evaluation, enabling objective comparison of ANN library performance.
How It Works
The framework utilizes pre-generated HDF5 datasets with ground truth for top-100 nearest neighbors. Each ANN library is encapsulated within a Docker container, ensuring consistent execution environments. Benchmarking is performed using Python scripts that orchestrate the indexing, querying, and result collection, with a focus on single-CPU saturation and reproducible parameter tuning.
Quick Start & Requirements
pip install -r requirements.txt
followed by python install.py
.install.py
can take 10-30 minutes. Running benchmarks (run.py
) can take days.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project focuses on CPU-based ANN algorithms and datasets that fit in RAM; billion-scale benchmarks are handled by a separate project. GPU support for libraries like FAISS requires local compilation and specific flags. The README mentions results are as of April 2025, implying potential for updates.
1 month ago
Inactive