awesome-sentence-embedding  by Separius

Curated list of pretrained sentence/word embedding models

created 6 years ago
2,266 stars

Top 20.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is a curated, albeit incomplete, list of pretrained sentence and word embedding models, aiming to be more up-to-date and comprehensive than existing resources. It serves researchers and practitioners in Natural Language Processing (NLP) looking for state-of-the-art embedding techniques.

How It Works

The project categorizes embeddings into Word Embeddings, Contextualized Word Embeddings, Pooling Methods, and Encoders. It generally follows a framework where word embeddings are processed by an optional encoder (e.g., LSTM) to produce contextualized word embeddings. These are then aggregated using pooling methods (e.g., mean pooling, last pooling) to form sentence representations. The list includes papers, citation counts, and links to training code and pretrained models.

Quick Start & Requirements

This is a curated list, not a runnable library. Users will need to refer to individual model repositories for installation and usage.

Highlighted Details

  • Comprehensive tables detailing numerous word and contextualized word embedding models, including their publication dates, citation counts, and available code/models.
  • Covers various pooling strategies and encoder architectures used in sentence embedding.
  • Includes sections on evaluation benchmarks and miscellaneous utilities for embedding manipulation.
  • Lists articles and blog posts for further reading on sentence similarity and embedding techniques.

Maintenance & Community

The repo is maintained by Separius, with an invitation for community contributions via pull requests to keep the list updated.

Licensing & Compatibility

The repository itself is likely under a permissive license (e.g., MIT, Apache 2.0, common for "awesome" lists), but the licensing of the individual models listed varies and must be checked separately.

Limitations & Caveats

The README explicitly states the list is "incomplete" and acknowledges that maintaining it is an ongoing effort. Some listed models may be outdated or have broken links.

Health Check
Last commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Didier Lopes Didier Lopes(Founder of OpenBB), and
11 more.

sentence-transformers by UKPLab

0.2%
17k
Framework for text embeddings, retrieval, and reranking
created 6 years ago
updated 3 days ago
Feedback? Help us improve.