smg  by lightseekorg

High-performance LLM gateway for diverse inference backends

Created 6 months ago
284 stars

Top 92.0% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> Shepherd Model Gateway (SMG) is a high-performance, engine-agnostic LLM gateway built in Rust. It addresses the complexity of managing large-scale LLM deployments by centralizing worker lifecycle management and traffic balancing across diverse HTTP/gRPC/OpenAI-compatible backends. SMG offers enterprise-ready control over history storage, privacy, and custom logic, benefiting users aiming for efficient, unified, and observable LLM infrastructure.

How It Works

SMG leverages native Rust for speed, featuring a gRPC pipeline and sub-millisecond routing decisions. Its core differentiator is "cache-aware routing," which intelligently understands the KV cache state of inference engines (SGLang, vLLM, TensorRT-LLM) to reuse computation prefixes, thereby maximizing GPU utilization and reducing redundant work. It provides a single, unified API endpoint that routes requests to self-hosted models or various cloud providers, simplifying integration and abstracting backend diversity.

Quick Start & Requirements

  • Installation: Docker (docker pull lightseekorg/smg:latest), Python (pip install smg), or Rust (cargo install smg).
  • Prerequisites: Standard development environments for Docker, Python, or Rust. No specific hardware or software dependencies beyond the chosen installation method are detailed.
  • Links: Official documentation and guides are referenced implicitly within the README.

Highlighted Details

  • Performance: Built with native Rust, featuring a gRPC pipeline, sub-millisecond routing, zero-copy tokenization, circuit breakers, and automatic failover.
  • Routing Flexibility: Supports 8 routing policies, including cache_aware for KV cache optimization, prefix_hash, consistent_hashing, and round_robin.
  • Broad Backend Support: Integrates with self-hosted engines (vLLM, SGLang, TensorRT-LLM, Ollama, OpenAI-compatible) and cloud providers (OpenAI, Anthropic, Gemini, Bedrock, Azure OpenAI).
  • Enterprise Features: Offers multi-tenant rate limiting with OIDC, WASM plugins for custom logic, pluggable chat history storage (PostgreSQL, Oracle, Redis, in-memory), and high-availability mesh networking.
  • Observability: Provides 40+ Prometheus metrics, OpenTelemetry tracing, and structured JSON logs for detailed monitoring.

Maintenance & Community

The project welcomes contributions, with a reference to a "Contributing Guide." No specific community channels (e.g., Discord, Slack) or details on core maintainers, sponsorships, or roadmap are present in the provided text.

Licensing & Compatibility

The README does not specify the project's license or any compatibility notes for commercial use or closed-source linking.

Limitations & Caveats

The provided README does not detail specific limitations, known bugs, alpha status, or unsupported platforms. The complexity of configuring and managing diverse LLM backends and enterprise features may present a practical adoption hurdle.

Health Check
Last Commit

11 hours ago

Responsiveness

Inactive

Pull Requests (30d)
155
Issues (30d)
8
Star History
97 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and David Cramer David Cramer(Cofounder of Sentry).

llmgateway by theopenco

1.2%
1k
LLM API gateway for unified provider access
Created 1 year ago
Updated 10 hours ago
Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Johannes Hagemann Johannes Hagemann(Cofounder of Prime Intellect), and
3 more.

minions by HazyResearch

0.1%
1k
Communication protocol for cost-efficient LLM collaboration
Created 1 year ago
Updated 2 months ago
Starred by Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
3 more.

llm-d by llm-d

1.1%
3k
Kubernetes-native framework for distributed LLM inference
Created 1 year ago
Updated 21 hours ago
Feedback? Help us improve.