ann-benchmarks  by erikbern

ANN benchmarks for approximate nearest neighbor search algorithms

created 10 years ago
5,384 stars

Top 9.5% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a comprehensive benchmarking framework for approximate nearest neighbor (ANN) search libraries, targeting researchers and engineers working with high-dimensional data. It offers standardized datasets, Dockerized environments for each algorithm, and tools for reproducible evaluation, enabling objective comparison of ANN library performance.

How It Works

The framework utilizes pre-generated HDF5 datasets with ground truth for top-100 nearest neighbors. Each ANN library is encapsulated within a Docker container, ensuring consistent execution environments. Benchmarking is performed using Python scripts that orchestrate the indexing, querying, and result collection, with a focus on single-CPU saturation and reproducible parameter tuning.

Quick Start & Requirements

  • Install: pip install -r requirements.txt followed by python install.py.
  • Prerequisites: Python (3.10.6 tested), Docker.
  • Setup Time: install.py can take 10-30 minutes. Running benchmarks (run.py) can take days.
  • Links: ann-benchmarks.com

Highlighted Details

  • Benchmarks over 40 ANN libraries including FAISS, NMSLIB, ScaNN, and Elasticsearch.
  • Supports various datasets (SIFT, GloVe, MNIST, etc.) with dimensions from 25 to 27,983.
  • Results are presented as plots and can be used to generate a website.
  • Includes a reproducibility protocol and related publications.

Maintenance & Community

  • Authors: Erik Bernhardsson, Martin Aumüller, Alexander Faithfull.
  • Open to pull requests for improvements and new library integrations.

Licensing & Compatibility

  • License: Not explicitly stated in the README.
  • Compatibility: Primarily CPU-based algorithms; GPU support is mentioned for FAISS but requires local compilation. Datasets fit in RAM.

Limitations & Caveats

The project focuses on CPU-based ANN algorithms and datasets that fit in RAM; billion-scale benchmarks are handled by a separate project. GPU support for libraries like FAISS requires local compilation and specific flags. The README mentions results are as of April 2025, implying potential for updates.

Health Check
Last commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
4
Star History
133 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.