openmodelz  by tensorchord

CLI tool for autoscaling LLM inference on Kubernetes (and other clusters)

Created 2 years ago
274 stars

Top 94.3% on SourcePulse

GitHubView on GitHub
Project Summary

OpenModelZ (mdz) simplifies the deployment of machine learning models, particularly LLMs, by automating infrastructure setup for data scientists and SREs. It enables effortless scaling from zero to multiple replicas, supports various inference frameworks (vLLM, Triton, custom), and integrates with prototyping tools like Gradio and Streamlit, providing accessible public subdomains for each deployment.

How It Works

OpenModelZ leverages a k3s-based architecture for its data plane, managing servers across VMs, bare-metal, or even single machines. The control plane handles deployments, routing requests via a load balancer and scaling inference servers based on workload. It utilizes wildcard DNS (sslip.io) for automatically provisioned public subdomains, making deployments easily accessible without manual configuration.

Quick Start & Requirements

  • Install: pip install openmodelz
  • Bootstrap server: mdz server start [public_ip] (requires root for port 80)
  • Deploy model: mdz deploy --image <docker_image> --name <deployment_name> --port <container_port> [--gpu <count>] [--node-labels <labels>]
  • Documentation: ROADMAP

Highlighted Details

  • Autoscaling from zero replicas based on workload.
  • Supports deploying any ML framework via Docker images.
  • Provides integrated support for Gradio, Streamlit, and Jupyter notebooks.
  • Enables scaling from a single machine to a cluster.

Maintenance & Community

  • Active development with contributors like Ce Gao, Jinjing Zhou, and Keming.
  • Community channels available via Discord.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

The agent version mentioned in the README is v0.0.13 (July 2023), indicating potential for outdated features or unaddressed bugs. The project is inspired by k3s and OpenFaaS, suggesting a reliance on their underlying technologies.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
4 more.

seldon-core by SeldonIO

0.2%
5k
MLOps framework for production model deployment on Kubernetes
Created 7 years ago
Updated 14 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 1 month ago
Feedback? Help us improve.