Discover and explore top open-source AI tools and projects—updated daily.
intenteeLoad balancer for llama.cpp servers
Top 29.9% on SourcePulse
Paddler is a stateful load balancer and reverse proxy specifically designed for llama.cpp servers, addressing the limitations of traditional load balancing strategies with AI workloads. It targets users running llama.cpp who need efficient request distribution aware of llama.cpp's unique slot-based concurrency model, enabling better resource utilization and scalability.
How It Works
Paddler employs a distributed agent-based architecture. Agents run alongside each llama.cpp instance, monitoring its available "slots" (concurrent request processing units) and reporting this state to the central Paddler balancer. The balancer then uses this slot-aware state to distribute incoming requests, ensuring optimal utilization of each llama.cpp server's capacity. This stateful approach is crucial for llama.cpp's continuous batching, unlike stateless methods.
Quick Start & Requirements
llama.cpp servers to be running with the --slots flag enabled.--external-llamacpp-addr, --local-llamacpp-addr, and --management-addr flags.--management-addr and --reverseproxy-addr flags.llama.cpp's slot endpoint.Highlighted Details
llama.cpp slots.llama.cpp instances for autoscaling.Maintenance & Community
llama.cpp version b4027 or above.Licensing & Compatibility
Limitations & Caveats
/slots endpoint requires explicit enablement via the --slots-endpoint-enable flag due to sensitive information disclosure.1 week ago
1 day
salesforce
beam-cloud
llm-d
bigscience-workshop
jlowin