sgpt by Muennighoff

GPT models for semantic search, code, and pretrained models

Created 3 years ago

872 stars

Top 41.2% on SourcePulse

View on GitHub

6 Experts Love This Project

Philipp Schmid

DevRel at Google DeepMind

Omar Sanseviero

DevRel at Google DeepMind

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Jesse Clark

Cofounder of Marqo

and 2 more!

Project Summary

SGPT provides pre-trained GPT models for semantic search, offering both Bi-Encoder and Cross-Encoder approaches for symmetric and asymmetric search tasks. It's designed for researchers and developers looking to leverage large language models for efficient and accurate information retrieval.

How It Works

SGPT-BE fine-tunes GPT models using contrastive learning on bias tensors and position-weighted mean pooling to generate semantically rich sentence embeddings. SGPT-CE utilizes GPT models' log probabilities without fine-tuning, directly assessing the likelihood of a query given a document. This dual approach allows for flexibility in balancing performance and computational cost.

Quick Start & Requirements

Installation: Primarily uses Hugging Face Transformers and Sentence Transformers. Install Sentence Transformers with pip install --upgrade git+https://github.com/UKPLab/sentence-transformers.git. For specific SGPT pooling, use pip install --upgrade git+https://github.com/Muennighoff/sentence-transformers.git@sgpt_poolings_specb.
Dependencies: Python, PyTorch, Transformers, Sentence Transformers, SciPy. Larger models (e.g., 5.8B) require significant GPU memory.
Models: Available on Hugging Face Hub (e.g., Muennighoff/SGPT-5.8B-weightedmean-nli-bitfit).
Documentation: Detailed examples and explanations are provided within the repository's README and sub-directory READMEs.

Highlighted Details

Offers both Bi-Encoder (embedding-based) and Cross-Encoder (re-ranking) models.
Introduces novel pooling strategies and special token usage for improved semantic representation.
Provides compatibility with the popular sentence-transformers library.
Models are available in various sizes, including large parameter counts (e.g., 5.8B).

Maintenance & Community

The project is actively updated, with recent releases including GRIT and GritLM models that unify previous SGPT architectures. The author, Niklas Muennighoff, is a notable contributor in the NLP space. Further updates and model requests can be made via GitHub issues.

Licensing & Compatibility

The project's models are generally available under permissive licenses compatible with commercial use, but specific model licenses on Hugging Face should be verified. The code itself appears to be MIT licensed.

Limitations & Caveats

Larger SGPT models require substantial GPU resources (e.g., >24GB VRAM for 5.8B models). While the paper claims state-of-the-art performance on benchmarks like BEIR and USEB, users should verify performance on their specific use cases. The project recommends newer GRIT/GritLM models, suggesting potential future deprecation of older SGPT models.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days