WordLlama by dleemiller

NLP toolkit for leveraging LLM token embeddings

Created 1 year ago

1,449 stars

Top 28.0% on SourcePulse

View on GitHub

5 Experts Love This Project

Rodrigo Nader

Cofounder of Langflow

Philipp Schmid

DevRel at Google DeepMind

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Travis Fischer

Founder of Agentic

and 1 more!

Project Summary

WordLlama is a lightweight NLP toolkit for efficient text similarity, deduplication, ranking, and clustering, optimized for CPU usage. It leverages recycled token embeddings from large language models (LLMs) to provide fast, compact word representations, making it ideal for resource-constrained environments and rapid prototyping.

How It Works

WordLlama extracts token embedding codebooks from LLMs (e.g., LLaMA 2, LLaMA 3) and trains a small, context-less model using average pooling. This approach yields compact embeddings (e.g., 16MB for a 256-dimensional model) that outperform traditional models like GloVe on MTEB benchmarks. It supports Matryoshka Representations for flexible dimension truncation and binary embeddings with Hamming similarity for accelerated computations.

Quick Start & Requirements

Install via pip: pip install wordllama
Load default model: from wordllama import WordLlama; wl = WordLlama.load()
CPU optimized, no GPU required.
Official Docs: https://github.com/dleemiller/wordllama

Highlighted Details

Achieves competitive performance on MTEB benchmarks, outperforming GloVe 300d with significantly smaller model sizes.
Features Matryoshka Representations for adjustable embedding dimensions.
Supports binary embeddings for fast Hamming distance calculations.
Numpy-only inference pipeline for easy deployment.
Includes functionality for semantic text splitting.

Maintenance & Community

Active development with recent updates in early 2025.
Community support via HF Space and Gradio Demo.
Citation provided for research use.

Licensing & Compatibility

MIT License.
Permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The project is primarily focused on CPU performance; GPU acceleration is not explicitly detailed. While MTEB results are provided, direct comparisons to the latest state-of-the-art embedding models are not always present.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days