Discover and explore top open-source AI tools and projects—updated daily.
xhlucaUltrafast Python BM25 implementation for lexical search
Top 27.2% on SourcePulse
Summary
BM25S (BM25-Sparse) is an ultrafast Python library for lexical search, implementing the BM25 ranking function. It is designed for engineers and researchers seeking high-performance text retrieval without heavy dependencies like Java or PyTorch. The library offers orders-of-magnitude speed improvements over existing Python implementations by leveraging sparse matrices for eager score computation and an optional Numba backend for further acceleration, making it suitable for large-scale text indexing and querying tasks.
How It Works
The core innovation lies in using Numpy and Scipy to create sparse matrices that store pre-computed scores for document tokens. This "eager sparse scoring" approach drastically reduces query-time computation compared to traditional methods. An optional Numba backend further optimizes performance by compiling Python code, yielding approximately a 2x speedup on larger datasets. This design prioritizes speed and memory efficiency.
Quick Start & Requirements
Installation is straightforward via pip: pip install bm25s. For enhanced functionality like stemming, install with pip install "bm25s[full]" or pip install PyStemmer. Optional JAX (pip install "jax[cpu]") can speed up top-k selection. Core dependencies include Numpy and Scipy. Links to a technical report and blog post are mentioned but not provided. Example usage and advanced examples are available within the repository.
Highlighted Details
mmap=True).Maintenance & Community
The README does not detail specific contributors, sponsorships, or community channels (e.g., Discord, Slack).
Licensing & Compatibility
The primary license for the bm25s project is not explicitly stated in the README. A utility function is noted as being Apache 2.0 licensed, borrowed from the BEIR library. This lack of a clear project-wide license is a significant point for due diligence, especially concerning commercial use or integration into closed-source projects.
Limitations & Caveats
No explicit limitations or known bugs are detailed in the provided text. The project appears actively developed, with recent updates mentioning Numba support.
1 week ago
Inactive
freedmand
devflowinc
microsoft