puck by baidu

ANN search engine library for industrial deployment

Created 2 years ago

366 stars

Top 77.0% on SourcePulse

View on GitHub

1 Expert Loves This Project

Xiaofan Luan

VP Engineering at Zilliz

Project Summary

Puck is a high-performance Approximate Nearest Neighbor (ANN) search engine designed for industrial deployment scenarios with memory and resource constraints. It offers two algorithms, Puck and Tinker, targeting large-scale and smaller datasets respectively, with Python wrappers for ease of use.

How It Works

Puck employs a two-layered inverted index architecture combined with multi-level quantization, compressing vectors to approximately 1/4 of their original size for significant memory savings. Tinker, optimized for smaller datasets, prioritizes search performance by storing similarity relationships, though it requires more memory than Puck. Both algorithms support cosine similarity, L2, and Inner Product distances.

Quick Start & Requirements

Install: Build from source using CMake and make. Python wrappers are available via setup.py install (refer to Dockerfile for details).
Prerequisites: Intel MKL (required for compilation, download from Intel), Python >= 3.6.0, CMake >= 3.21.
Build: cmake -DCMAKE_BUILD_TYPE=Release -DMKLROOT=${MKLROOT} -DBLA_VENDOR=Intel10_64lp_seq -DBLA_STATIC=ON -B build . && cd build && make && make install
Resources: Requires MKL installation and compilation time.
Docs: README

Highlighted Details

Puck achieved top performance on 1B-datasets in the NeurIPS'21 competition, with a 70% performance increase since.
Tinker outperforms Nmslib on big-ann-benchmarks for smaller datasets.
Supports cosine, L2, and IP distances, with IP2COS transform for IP to cosine conversion.
Memory usage for Puck is ~1/4 of original vectors by default; Tinker uses more memory than original vectors but less than Nmslib.

Maintenance & Community

Project is hosted by Baidu.
QQ group available for discussion.

Licensing & Compatibility

License details are not explicitly stated in the README. Compatibility for commercial or closed-source use is not specified.

Limitations & Caveats

Compilation requires Intel MKL, which may be a significant dependency.
The README does not specify the exact license, hindering clear compatibility assessment for commercial use.

Health Check

Last Commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days