autofaiss  by criteo

Auto indexer for k-NN similarity search using Faiss

created 4 years ago
867 stars

Top 42.3% on sourcepulse

GitHubView on GitHub
Project Summary

AutoFaiss automates the creation of efficient Faiss KNN indices, targeting users who need to build large-scale similarity search indexes with optimized recall, query speed, and memory constraints. It simplifies the complex process of selecting optimal Faiss index types and hyperparameters, enabling users to build massive indexes (e.g., 200M vectors, 1TB) on modest hardware (e.g., 15GB RAM) with millisecond latency.

How It Works

AutoFaiss employs a heuristic-driven approach, leveraging Faiss's efficient index types and binary search to automatically determine the best indexing parameters. It balances recall, query speed, and memory usage against user-defined constraints, making it suitable for large datasets where manual tuning is prohibitive. The library supports both in-memory and disk-based embeddings, with options for memory-mapped indices to further reduce memory footprints.

Quick Start & Requirements

  • Install: pip install autofaiss
  • Prerequisites: Python 3.x. GPU support is experimental and not tested. PySpark is required for distributed indexing.
  • Resources: Can build a 1TB index with 15GB RAM.
  • Docs: Official Documentation, Colab Notebooks

Highlighted Details

  • Automates Faiss index selection for optimal recall, query speed, and memory usage.
  • Supports building large indexes (200M vectors, 1TB) on limited hardware (15GB RAM).
  • Offers memory-mapped index creation for reduced memory footprint.
  • Provides distributed index building capabilities via PySpark.

Maintenance & Community

  • Active development by Criteo.
  • Community support via Discord/Slack is not explicitly mentioned.

Licensing & Compatibility

  • License: MIT.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

  • GPU usage is experimental and not tested.
  • Memory-mapped indices are limited to Faiss IVF index types.
  • The make_direct_map option for vector reconstruction can significantly increase RAM usage.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.