openmodelz  by tensorchord

CLI tool for autoscaling LLM inference on Kubernetes (and other clusters)

created 2 years ago
270 stars

Top 95.9% on sourcepulse

GitHubView on GitHub
Project Summary

OpenModelZ (mdz) simplifies the deployment of machine learning models, particularly LLMs, by automating infrastructure setup for data scientists and SREs. It enables effortless scaling from zero to multiple replicas, supports various inference frameworks (vLLM, Triton, custom), and integrates with prototyping tools like Gradio and Streamlit, providing accessible public subdomains for each deployment.

How It Works

OpenModelZ leverages a k3s-based architecture for its data plane, managing servers across VMs, bare-metal, or even single machines. The control plane handles deployments, routing requests via a load balancer and scaling inference servers based on workload. It utilizes wildcard DNS (sslip.io) for automatically provisioned public subdomains, making deployments easily accessible without manual configuration.

Quick Start & Requirements

  • Install: pip install openmodelz
  • Bootstrap server: mdz server start [public_ip] (requires root for port 80)
  • Deploy model: mdz deploy --image <docker_image> --name <deployment_name> --port <container_port> [--gpu <count>] [--node-labels <labels>]
  • Documentation: ROADMAP

Highlighted Details

  • Autoscaling from zero replicas based on workload.
  • Supports deploying any ML framework via Docker images.
  • Provides integrated support for Gradio, Streamlit, and Jupyter notebooks.
  • Enables scaling from a single machine to a cluster.

Maintenance & Community

  • Active development with contributors like Ce Gao, Jinjing Zhou, and Keming.
  • Community channels available via Discord.

Licensing & Compatibility

  • Licensed under Apache 2.0.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

The agent version mentioned in the README is v0.0.13 (July 2023), indicating potential for outdated features or unaddressed bugs. The project is inspired by k3s and OpenFaaS, suggesting a reliance on their underlying technologies.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
8 more.

higgsfield by higgsfield-ai

0.3%
3k
ML framework for large model training and GPU orchestration
created 7 years ago
updated 1 year ago
Feedback? Help us improve.