neural-cherche  by raphaelsty

Library for neural search model fine-tuning and efficient inference

created 2 years ago
362 stars

Top 78.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Neural-Cherche is a Python library for fine-tuning and deploying neural search models like Splade, ColBERT, and SparseEmbed. It targets researchers and developers needing to adapt state-of-the-art retrieval systems to specific datasets for improved performance in offline and online applications. The library simplifies the process of training, inference, and embedding management.

How It Works

Neural-Cherche facilitates fine-tuning using a triplet loss approach (anchor, positive, negative) on datasets formatted as tuples. It supports ColBERT fine-tuning from any Sentence Transformer checkpoint and Splade/SparseEmbed from MLM pre-trained models. The library also provides efficient inference classes for both retrieval and ranking stages, enabling users to build hybrid search systems. It allows saving computed embeddings to avoid redundant calculations.

Quick Start & Requirements

  • Install: pip install neural-cherche or pip install "neural-cherche[eval]" for evaluation during training.
  • Prerequisites: Python 3.x, PyTorch. GPU or MPS device recommended for training.
  • Documentation: https://neural-cherche.readthedocs.io/en/latest/

Highlighted Details

  • Supports CPU, GPU, and MPS devices.
  • Provides pre-trained checkpoints for ColBERT and SparseEmbed on MS-MARCO.
  • Includes implementations for BM25, TFIDF, SparseEmbed, SPLADE, and ColBERT.
  • Offers a hybrid retrieval pipeline combining BM25 with ColBERT ranking for state-of-the-art results on benchmarks like SciFact.

Maintenance & Community

  • Contributors: Benjamin Clavié, Arthur Satouf.
  • References key papers for SPLADE, SparseEmbed, and ColBERT.

Licensing & Compatibility

  • Library License: MIT.
  • Model Licenses: Splade model is non-commercial only. SparseEmbed and ColBERT are fully open-source, including for commercial use.

Limitations & Caveats

The Splade model is restricted to non-commercial use, which may impact its applicability in certain enterprise environments. Fine-tuning Splade and SparseEmbed requires MLM pre-trained models, adding a dependency on specific model architectures.

Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

applied-ai by pytorch-labs

0.3%
289
Applied AI experiments and examples for PyTorch
created 2 years ago
updated 2 months ago
Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Feedback? Help us improve.