Inference solution for text embeddings models
Top 12.9% on sourcepulse
Text Embeddings Inference (TEI) is a high-performance inference solution for deploying and serving open-source text embeddings and sequence classification models. It targets developers and researchers needing efficient, scalable inference for applications like RAG, semantic search, and sentiment analysis, offering significant speedups and reduced latency.
How It Works
TEI leverages optimized Rust code, Flash Attention, and cuBLASLt for accelerated inference. It supports dynamic batching, Safetensors and ONNX weight loading, and features Metal support for local execution on Macs. The architecture is designed for low latency and high throughput, with features like OpenTelemetry for distributed tracing and Prometheus metrics for monitoring.
Quick Start & Requirements
docker run --gpus all -p 8080:80 -v $PWD/data:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.7 --model-id BAAI/bge-large-en-v1.5
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 week ago
1 day