ome  by sgl-project

Kubernetes operator for LLM serving

Created 4 months ago
271 stars

Top 95.0% on SourcePulse

GitHubView on GitHub
Project Summary

OME addresses the complex challenge of managing and serving Large Language Models (LLMs) within Kubernetes environments. It targets engineers and researchers who need robust, automated solutions for deploying and optimizing LLM inference at scale, offering benefits like improved resource utilization and simplified operational overhead.

How It Works

OME operates as a Kubernetes operator, leveraging custom resources to define and manage LLMs as first-class citizens. It automates model parsing to extract critical metadata, intelligently selects optimal serving runtimes (like SGLang or Triton) based on model characteristics and weighted scoring, and orchestrates sophisticated deployment patterns. This approach optimizes GPU bin-packing and enables dynamic re-optimization for efficient resource utilization and high availability.

Quick Start & Requirements

  • Installation: Recommended via OCI Registry or Helm repository.
    • OCI: helm upgrade --install ome-crd oci://ghcr.io/moirai-internal/charts/ome-crd --namespace ome --create-namespace and helm upgrade --install ome oci://ghcr.io/moirai-internal/charts/ome-resources --namespace ome
    • Helm: helm repo add ome https://sgl-project.github.io/ome, helm repo update, then install CRDs and resources.
  • Prerequisites: Kubernetes 1.28 or newer.
  • Documentation: OME concepts, Common tasks, Installation guide.

Highlighted Details

  • Supports advanced deployment patterns including prefill-decode disaggregation and multi-node inference.
  • Integrates deeply with Kubernetes components like Kueue, LeaderWorkerSet, KEDA, and Gateway API.
  • Features automated benchmarking via the BenchmarkJob custom resource for performance evaluation.
  • Provides first-class support for SGLang, an advanced inference engine.

Maintenance & Community

  • Active development with a roadmap prioritizing enhanced model parsing and quantization support.
  • Community support via GitHub Issues for bug reports and feature requests.

Licensing & Compatibility

  • Licensed under the MIT License, generally permissive for commercial use and integration.

Limitations & Caveats

  • The project is actively developed, with ongoing work to support additional model families and quantization workflows.
Health Check
Last Commit

17 hours ago

Responsiveness

Inactive

Pull Requests (30d)
27
Issues (30d)
5
Star History
48 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.3%
4k
AI inference pipeline framework
Created 1 year ago
Updated 1 day ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 1 month ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
4 more.

ktransformers by kvcache-ai

0.3%
15k
Framework for LLM inference optimization experimentation
Created 1 year ago
Updated 2 days ago
Feedback? Help us improve.