openmodelz by tensorchord

CLI tool for autoscaling LLM inference on Kubernetes (and other clusters)

Created 2 years ago

279 stars

Top 93.2% on SourcePulse

Project Summary

OpenModelZ (mdz) simplifies the deployment of machine learning models, particularly LLMs, by automating infrastructure setup for data scientists and SREs. It enables effortless scaling from zero to multiple replicas, supports various inference frameworks (vLLM, Triton, custom), and integrates with prototyping tools like Gradio and Streamlit, providing accessible public subdomains for each deployment.

How It Works

OpenModelZ leverages a k3s-based architecture for its data plane, managing servers across VMs, bare-metal, or even single machines. The control plane handles deployments, routing requests via a load balancer and scaling inference servers based on workload. It utilizes wildcard DNS (sslip.io) for automatically provisioned public subdomains, making deployments easily accessible without manual configuration.

Quick Start & Requirements

Install: pip install openmodelz
Bootstrap server: mdz server start [public_ip] (requires root for port 80)
Deploy model: mdz deploy --image <docker_image> --name <deployment_name> --port <container_port> [--gpu <count>] [--node-labels <labels>]
Documentation: ROADMAP

Highlighted Details

Autoscaling from zero replicas based on workload.
Supports deploying any ML framework via Docker images.
Provides integrated support for Gradio, Streamlit, and Jupyter notebooks.
Enables scaling from a single machine to a cluster.

Maintenance & Community

Active development with contributors like Ce Gao, Jinjing Zhou, and Keming.
Community channels available via Discord.

Licensing & Compatibility

Licensed under Apache 2.0.
Compatible with commercial and closed-source applications.

Limitations & Caveats

The agent version mentioned in the README is v0.0.13 (July 2023), indicating potential for outdated features or unaddressed bugs. The project is inspired by k3s and OpenFaaS, suggesting a reliance on their underlying technologies.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days