mosec  by mosecorg

ML model serving framework for efficient cloud deployment

Created 4 years ago
889 stars

Top 40.6% on SourcePulse

GitHubView on GitHub
Project Summary

Mosec is a high-performance, flexible model serving framework designed for building ML-enabled backend services. It targets ML engineers and researchers who need to efficiently deploy trained models as APIs, offering dynamic batching and CPU/GPU pipeline support to maximize hardware utilization.

How It Works

Mosec leverages Rust for its web layer and task coordination, ensuring high performance and efficient CPU utilization via async I/O. It supports dynamic batching to aggregate requests for batched inference and allows for pipelined stages using multiple processes to handle mixed CPU/GPU/IO workloads. The framework is cloud-friendly, featuring model warmup, graceful shutdown, and Prometheus monitoring metrics, making it easily manageable by container orchestration systems like Kubernetes.

Quick Start & Requirements

  • Install via pip: pip install -U mosec or conda install conda-forge::mosec.
  • Requires Python 3.7+.
  • Building from source requires Rust.
  • See examples for detailed usage: https://mosecorg.github.io/mosec/

Highlighted Details

  • Dynamic batching with configurable max_batch_size and max_wait_time.
  • Supports multiple serialization formats (JSON, Msgpack) and custom mixins.
  • Enables multi-stage pipelines for complex workflows.
  • Includes Prometheus metrics for monitoring service health and performance.
  • Offers GPU offloading and customized GPU allocation.

Maintenance & Community

  • Active development with contributions from multiple authors.
  • Community support available via Discord.
  • Used by companies like TencentCloud, Modelz, and TensorChord.

Licensing & Compatibility

  • The license is not explicitly stated in the README.

Limitations & Caveats

  • The README does not specify the license, which could be a blocker for commercial adoption.
  • For multi-stage services, passing extremely large data between stages via default serialization might slow down the pipeline.
Health Check
Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)
4
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 5 months ago
Feedback? Help us improve.