autofaiss  by criteo

Auto indexer for k-NN similarity search using Faiss

Created 4 years ago
892 stars

Top 40.5% on SourcePulse

GitHubView on GitHub
Project Summary

AutoFaiss automates the creation of efficient Faiss KNN indices, targeting users who need to build large-scale similarity search indexes with optimized recall, query speed, and memory constraints. It simplifies the complex process of selecting optimal Faiss index types and hyperparameters, enabling users to build massive indexes (e.g., 200M vectors, 1TB) on modest hardware (e.g., 15GB RAM) with millisecond latency.

How It Works

AutoFaiss employs a heuristic-driven approach, leveraging Faiss's efficient index types and binary search to automatically determine the best indexing parameters. It balances recall, query speed, and memory usage against user-defined constraints, making it suitable for large datasets where manual tuning is prohibitive. The library supports both in-memory and disk-based embeddings, with options for memory-mapped indices to further reduce memory footprints.

Quick Start & Requirements

  • Install: pip install autofaiss
  • Prerequisites: Python 3.x. GPU support is experimental and not tested. PySpark is required for distributed indexing.
  • Resources: Can build a 1TB index with 15GB RAM.
  • Docs: Official Documentation, Colab Notebooks

Highlighted Details

  • Automates Faiss index selection for optimal recall, query speed, and memory usage.
  • Supports building large indexes (200M vectors, 1TB) on limited hardware (15GB RAM).
  • Offers memory-mapped index creation for reduced memory footprint.
  • Provides distributed index building capabilities via PySpark.

Maintenance & Community

  • Active development by Criteo.
  • Community support via Discord/Slack is not explicitly mentioned.

Licensing & Compatibility

  • License: MIT.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

  • GPU usage is experimental and not tested.
  • Memory-mapped indices are limited to Faiss IVF index types.
  • The make_direct_map option for vector reconstruction can significantly increase RAM usage.
Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Simon Willison Simon Willison(Coauthor of Django), and
1 more.

faiss_tips by matsui528

0.2%
629
Faiss tips and tricks
Created 7 years ago
Updated 4 months ago
Starred by Chang She Chang She(Cofounder of LanceDB), Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), and
11 more.

lancedb by lancedb

0.9%
8k
Embedded retrieval engine for multimodal AI
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.