mosec  by mosecorg

ML model serving framework for efficient cloud deployment

created 4 years ago
849 stars

Top 42.9% on sourcepulse

GitHubView on GitHub
Project Summary

Mosec is a high-performance, flexible model serving framework designed for building ML-enabled backend services. It targets ML engineers and researchers who need to efficiently deploy trained models as APIs, offering dynamic batching and CPU/GPU pipeline support to maximize hardware utilization.

How It Works

Mosec leverages Rust for its web layer and task coordination, ensuring high performance and efficient CPU utilization via async I/O. It supports dynamic batching to aggregate requests for batched inference and allows for pipelined stages using multiple processes to handle mixed CPU/GPU/IO workloads. The framework is cloud-friendly, featuring model warmup, graceful shutdown, and Prometheus monitoring metrics, making it easily manageable by container orchestration systems like Kubernetes.

Quick Start & Requirements

  • Install via pip: pip install -U mosec or conda install conda-forge::mosec.
  • Requires Python 3.7+.
  • Building from source requires Rust.
  • See examples for detailed usage: https://mosecorg.github.io/mosec/

Highlighted Details

  • Dynamic batching with configurable max_batch_size and max_wait_time.
  • Supports multiple serialization formats (JSON, Msgpack) and custom mixins.
  • Enables multi-stage pipelines for complex workflows.
  • Includes Prometheus metrics for monitoring service health and performance.
  • Offers GPU offloading and customized GPU allocation.

Maintenance & Community

  • Active development with contributions from multiple authors.
  • Community support available via Discord.
  • Used by companies like TencentCloud, Modelz, and TensorChord.

Licensing & Compatibility

  • The license is not explicitly stated in the README.

Limitations & Caveats

  • The README does not specify the license, which could be a blocker for commercial adoption.
  • For multi-stage services, passing extremely large data between stages via default serialization might slow down the pipeline.
Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
3
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera).

towhee by towhee-io

0.2%
3k
Framework for neural data processing pipelines
created 4 years ago
updated 9 months ago
Feedback? Help us improve.