sentence-transformers  by UKPLab

Framework for text embeddings, retrieval, and reranking

created 6 years ago
17,229 stars

Top 2.7% on sourcepulse

GitHubView on GitHub
Project Summary

This framework provides an easy method to compute state-of-the-art text embeddings and perform semantic search, similarity, and reranking. It is designed for researchers and developers working with Natural Language Processing (NLP) tasks, offering access to over 10,000 pre-trained models and the ability to train custom models.

How It Works

The library leverages Siamese and Siamese-like network structures (Sentence-BERT) to generate fixed-size embeddings for sentences or paragraphs. This approach allows for efficient similarity calculations using cosine similarity on these embeddings. For improved relevance in search scenarios, it also incorporates Cross-Encoder models that directly compare pairs of texts, yielding higher accuracy but at a greater computational cost.

Quick Start & Requirements

  • Install with pip: pip install -U sentence-transformers or conda install -c conda-forge sentence-transformers.
  • Requires Python 3.9+, PyTorch 1.11.0+, and transformers v4.34.0+.
  • GPU/CUDA support requires PyTorch installation with matching CUDA version.
  • Official documentation: www.SBERT.net.

Highlighted Details

  • Access to over 10,000 pre-trained models from Hugging Face, including state-of-the-art models from the MTEB leaderboard.
  • Supports training and fine-tuning of custom embedding and reranker models.
  • Offers extensive training options: various transformer architectures (BERT, RoBERTa, etc.), multilingual and multi-task learning, and over 30 loss functions.
  • Enables diverse applications: semantic search, textual similarity, clustering, paraphrase mining, and more.

Maintenance & Community

  • Maintained by Tom Aarsen at UKPLab, TU Darmstadt.
  • Encourages opening issues for problems or questions.
  • Citable publications are provided for the Sentence-BERT and multilingual models.

Licensing & Compatibility

  • The repository is published for research purposes.
  • Licensing details are not explicitly stated in the README, but usage of underlying models may vary.

Limitations & Caveats

The README states the repository contains "experimental software" and is published for "additional background details on the respective publication," suggesting it may not be intended for production use without further evaluation.

Health Check
Last commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
25
Issues (30d)
36
Star History
682 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.