sgpt  by Muennighoff

GPT models for semantic search, code, and pretrained models

created 3 years ago
869 stars

Top 42.2% on sourcepulse

GitHubView on GitHub
Project Summary

SGPT provides pre-trained GPT models for semantic search, offering both Bi-Encoder and Cross-Encoder approaches for symmetric and asymmetric search tasks. It's designed for researchers and developers looking to leverage large language models for efficient and accurate information retrieval.

How It Works

SGPT-BE fine-tunes GPT models using contrastive learning on bias tensors and position-weighted mean pooling to generate semantically rich sentence embeddings. SGPT-CE utilizes GPT models' log probabilities without fine-tuning, directly assessing the likelihood of a query given a document. This dual approach allows for flexibility in balancing performance and computational cost.

Quick Start & Requirements

  • Installation: Primarily uses Hugging Face Transformers and Sentence Transformers. Install Sentence Transformers with pip install --upgrade git+https://github.com/UKPLab/sentence-transformers.git. For specific SGPT pooling, use pip install --upgrade git+https://github.com/Muennighoff/sentence-transformers.git@sgpt_poolings_specb.
  • Dependencies: Python, PyTorch, Transformers, Sentence Transformers, SciPy. Larger models (e.g., 5.8B) require significant GPU memory.
  • Models: Available on Hugging Face Hub (e.g., Muennighoff/SGPT-5.8B-weightedmean-nli-bitfit).
  • Documentation: Detailed examples and explanations are provided within the repository's README and sub-directory READMEs.

Highlighted Details

  • Offers both Bi-Encoder (embedding-based) and Cross-Encoder (re-ranking) models.
  • Introduces novel pooling strategies and special token usage for improved semantic representation.
  • Provides compatibility with the popular sentence-transformers library.
  • Models are available in various sizes, including large parameter counts (e.g., 5.8B).

Maintenance & Community

The project is actively updated, with recent releases including GRIT and GritLM models that unify previous SGPT architectures. The author, Niklas Muennighoff, is a notable contributor in the NLP space. Further updates and model requests can be made via GitHub issues.

Licensing & Compatibility

The project's models are generally available under permissive licenses compatible with commercial use, but specific model licenses on Hugging Face should be verified. The code itself appears to be MIT licensed.

Limitations & Caveats

Larger SGPT models require substantial GPU resources (e.g., >24GB VRAM for 5.8B models). While the paper claims state-of-the-art performance on benchmarks like BEIR and USEB, users should verify performance on their specific use cases. The project recommends newer GRIT/GritLM models, suggesting potential future deprecation of older SGPT models.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
11 more.

sentence-transformers by UKPLab

0.2%
17k
Framework for text embeddings, retrieval, and reranking
created 6 years ago
updated 3 days ago
Feedback? Help us improve.