Discover and explore top open-source AI tools and projects—updated daily.
thushanHigh-performance proxy and load balancer for LLM infrastructure
Top 99.5% on SourcePulse
Summary
Olla is a high-performance, low-overhead proxy and load balancer for LLM infrastructure. It intelligently routes requests across diverse inference backends, offering automatic failover, unified model discovery, and sticky sessions to enhance reliability and efficiency. This tool targets engineers and researchers managing LLM deployments, providing a unified interface to various inference engines.
How It Works
Olla acts as an intelligent intermediary, directing LLM requests to suitable inference backends. It features two proxy engines: Sherpa (simple) and Olla (advanced, with circuit breakers/connection pooling). Key functions include unifying model discovery across providers and KV-cache-aware affinity routing for sticky sessions. Automatic failover, retries, and continuous health monitoring ensure high availability.
Quick Start & Requirements
Installation options include a bash script (curl -s https://raw.githubusercontent.com/thushan/olla/main/install.sh | bash), Docker (docker run -p 40114:40114 ghcr.io/thushan/olla:latest), Go (go install github.com/thushan/olla@latest), or building from source. No specific non-default hardware or software prerequisites are detailed beyond standard OS and Docker support. Full documentation is available at https://thushan.github.io/olla/.
Highlighted Details
Maintenance & Community
Developed by TensorFoundry. Key links include GitHub issues (https://github.com/thushan/olla/issues) and releases (https://github.com/thushan/olla/releases). No specific community chat channels are mentioned.
Licensing & Compatibility
Licensed under the Apache License 2.0, permissive for commercial use. Supports Linux, macOS, Windows, and Docker across AMD64 and ARM64 architectures.
Limitations & Caveats
The Anthropic Messages API translation is noted as "still actively being improved." Users may face limitations with highly custom or unsupported inference engines, potentially requiring manual integration efforts.
1 day ago
Inactive
workweave
lightseekorg
mostlygeek