NLP toolkit for leveraging LLM token embeddings
Top 28.9% on sourcepulse
WordLlama is a lightweight NLP toolkit for efficient text similarity, deduplication, ranking, and clustering, optimized for CPU usage. It leverages recycled token embeddings from large language models (LLMs) to provide fast, compact word representations, making it ideal for resource-constrained environments and rapid prototyping.
How It Works
WordLlama extracts token embedding codebooks from LLMs (e.g., LLaMA 2, LLaMA 3) and trains a small, context-less model using average pooling. This approach yields compact embeddings (e.g., 16MB for a 256-dimensional model) that outperform traditional models like GloVe on MTEB benchmarks. It supports Matryoshka Representations for flexible dimension truncation and binary embeddings with Hamming similarity for accelerated computations.
Quick Start & Requirements
pip install wordllama
from wordllama import WordLlama; wl = WordLlama.load()
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is primarily focused on CPU performance; GPU acceleration is not explicitly detailed. While MTEB results are provided, direct comparisons to the latest state-of-the-art embedding models are not always present.
4 months ago
1 week