Curated list of pretrained sentence/word embedding models
Top 20.4% on sourcepulse
This repository is a curated, albeit incomplete, list of pretrained sentence and word embedding models, aiming to be more up-to-date and comprehensive than existing resources. It serves researchers and practitioners in Natural Language Processing (NLP) looking for state-of-the-art embedding techniques.
How It Works
The project categorizes embeddings into Word Embeddings, Contextualized Word Embeddings, Pooling Methods, and Encoders. It generally follows a framework where word embeddings are processed by an optional encoder (e.g., LSTM) to produce contextualized word embeddings. These are then aggregated using pooling methods (e.g., mean pooling, last pooling) to form sentence representations. The list includes papers, citation counts, and links to training code and pretrained models.
Quick Start & Requirements
This is a curated list, not a runnable library. Users will need to refer to individual model repositories for installation and usage.
Highlighted Details
Maintenance & Community
The repo is maintained by Separius, with an invitation for community contributions via pull requests to keep the list updated.
Licensing & Compatibility
The repository itself is likely under a permissive license (e.g., MIT, Apache 2.0, common for "awesome" lists), but the licensing of the individual models listed varies and must be checked separately.
Limitations & Caveats
The README explicitly states the list is "incomplete" and acknowledges that maintaining it is an ongoing effort. Some listed models may be outdated or have broken links.
4 years ago
Inactive