paddler  by intentee

Load balancer for llama.cpp servers

Created 1 year ago
1,302 stars

Top 30.6% on SourcePulse

GitHubView on GitHub
Project Summary

Paddler is a stateful load balancer and reverse proxy specifically designed for llama.cpp servers, addressing the limitations of traditional load balancing strategies with AI workloads. It targets users running llama.cpp who need efficient request distribution aware of llama.cpp's unique slot-based concurrency model, enabling better resource utilization and scalability.

How It Works

Paddler employs a distributed agent-based architecture. Agents run alongside each llama.cpp instance, monitoring its available "slots" (concurrent request processing units) and reporting this state to the central Paddler balancer. The balancer then uses this slot-aware state to distribute incoming requests, ensuring optimal utilization of each llama.cpp server's capacity. This stateful approach is crucial for llama.cpp's continuous batching, unlike stateless methods.

Quick Start & Requirements

  • Install by downloading pre-compiled binaries for Linux, macOS, or Windows from the releases page.
  • Requires llama.cpp servers to be running with the --slots flag enabled.
  • Agents require --external-llamacpp-addr, --local-llamacpp-addr, and --management-addr flags.
  • The balancer requires --management-addr and --reverseproxy-addr flags.
  • Setup involves running the balancer and its agents, with configuration for llama.cpp's slot endpoint.

Highlighted Details

  • State-aware load balancing based on llama.cpp slots.
  • Dynamic addition/removal of llama.cpp instances for autoscaling.
  • Request buffering to support scaling from zero hosts.
  • Built-in dashboard and StatsD metrics for monitoring.
  • AWS integration capabilities.

Maintenance & Community

  • Actively maintained with recent updates including a TUI dashboard.
  • Discord community available at https://discord.gg/kysUzFqSCK.
  • Requires llama.cpp version b4027 or above.

Licensing & Compatibility

  • Licensed under the MIT License, permitting commercial use and closed-source linking.

Limitations & Caveats

  • The project was recently rewritten in Rust using the Pingora framework, with v1.0.0 marking API stability until v2.0.0.
  • Exposing the /slots endpoint requires explicit enablement via the --slots-endpoint-enable flag due to sensitive information disclosure.
Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
2
Issues (30d)
0
Star History
219 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
1 more.

jaxformer by salesforce

0.7%
301
JAX library for LLM training on TPUs
Created 3 years ago
Updated 1 year ago
Starred by Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
11 more.

petals by bigscience-workshop

0.1%
10k
Run LLMs at home, BitTorrent-style
Created 3 years ago
Updated 1 year ago
Feedback? Help us improve.