Text embeddings tool for vectorizing text
Top 10.5% on sourcepulse
This repository provides a toolkit for converting text into vector representations, targeting developers and researchers working with natural language processing tasks like semantic similarity and text matching. It offers implementations of various text embedding models, including Word2Vec, Sentence-BERT, and CoSENT, enabling users to efficiently represent and compare textual data.
How It Works
The library implements several text embedding strategies: Word2Vec for word-level embeddings (averaged for sentences), Sentence-BERT (SBERT) for sentence embeddings using supervised training, and CoSENT, which improves upon SBERT with a ranking-based loss function for faster convergence and better performance. It also supports BGE (BAAI General Embedding) models, pre-trained and fine-tuned using contrastive learning.
Quick Start & Requirements
pip install -U text2vec
or pip install torch
followed by pip install -r requirements.txt
and pip install --no-deps .
after cloning the repository.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 month ago
Inactive