CLI tool for autoscaling LLM inference on Kubernetes (and other clusters)
Top 95.9% on sourcepulse
OpenModelZ (mdz) simplifies the deployment of machine learning models, particularly LLMs, by automating infrastructure setup for data scientists and SREs. It enables effortless scaling from zero to multiple replicas, supports various inference frameworks (vLLM, Triton, custom), and integrates with prototyping tools like Gradio and Streamlit, providing accessible public subdomains for each deployment.
How It Works
OpenModelZ leverages a k3s-based architecture for its data plane, managing servers across VMs, bare-metal, or even single machines. The control plane handles deployments, routing requests via a load balancer and scaling inference servers based on workload. It utilizes wildcard DNS (sslip.io) for automatically provisioned public subdomains, making deployments easily accessible without manual configuration.
Quick Start & Requirements
pip install openmodelz
mdz server start [public_ip]
(requires root for port 80)mdz deploy --image <docker_image> --name <deployment_name> --port <container_port> [--gpu <count>] [--node-labels <labels>]
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The agent version mentioned in the README is v0.0.13 (July 2023), indicating potential for outdated features or unaddressed bugs. The project is inspired by k3s and OpenFaaS, suggesting a reliance on their underlying technologies.
1 year ago
1 day