paddler  by intentee

Load balancer for llama.cpp servers

created 1 year ago
799 stars

Top 45.0% on sourcepulse

GitHubView on GitHub
Project Summary

Paddler is a stateful load balancer and reverse proxy specifically designed for llama.cpp servers, addressing the limitations of traditional load balancing strategies with AI workloads. It targets users running llama.cpp who need efficient request distribution aware of llama.cpp's unique slot-based concurrency model, enabling better resource utilization and scalability.

How It Works

Paddler employs a distributed agent-based architecture. Agents run alongside each llama.cpp instance, monitoring its available "slots" (concurrent request processing units) and reporting this state to the central Paddler balancer. The balancer then uses this slot-aware state to distribute incoming requests, ensuring optimal utilization of each llama.cpp server's capacity. This stateful approach is crucial for llama.cpp's continuous batching, unlike stateless methods.

Quick Start & Requirements

  • Install by downloading pre-compiled binaries for Linux, macOS, or Windows from the releases page.
  • Requires llama.cpp servers to be running with the --slots flag enabled.
  • Agents require --external-llamacpp-addr, --local-llamacpp-addr, and --management-addr flags.
  • The balancer requires --management-addr and --reverseproxy-addr flags.
  • Setup involves running the balancer and its agents, with configuration for llama.cpp's slot endpoint.

Highlighted Details

  • State-aware load balancing based on llama.cpp slots.
  • Dynamic addition/removal of llama.cpp instances for autoscaling.
  • Request buffering to support scaling from zero hosts.
  • Built-in dashboard and StatsD metrics for monitoring.
  • AWS integration capabilities.

Maintenance & Community

  • Actively maintained with recent updates including a TUI dashboard.
  • Discord community available at https://discord.gg/kysUzFqSCK.
  • Requires llama.cpp version b4027 or above.

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

  • The project was recently rewritten in Rust using the Pingora framework, with v1.0.0 marking API stability until v2.0.0.
  • Exposing the /slots endpoint requires explicit enablement via the --slots-endpoint-enable flag due to sensitive information disclosure.
Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
12
Issues (30d)
7
Star History
53 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.