mosec  by mosecorg

ML model serving framework for efficient cloud deployment

Created 4 years ago
859 stars

Top 41.7% on SourcePulse

GitHubView on GitHub
Project Summary

Mosec is a high-performance, flexible model serving framework designed for building ML-enabled backend services. It targets ML engineers and researchers who need to efficiently deploy trained models as APIs, offering dynamic batching and CPU/GPU pipeline support to maximize hardware utilization.

How It Works

Mosec leverages Rust for its web layer and task coordination, ensuring high performance and efficient CPU utilization via async I/O. It supports dynamic batching to aggregate requests for batched inference and allows for pipelined stages using multiple processes to handle mixed CPU/GPU/IO workloads. The framework is cloud-friendly, featuring model warmup, graceful shutdown, and Prometheus monitoring metrics, making it easily manageable by container orchestration systems like Kubernetes.

Quick Start & Requirements

  • Install via pip: pip install -U mosec or conda install conda-forge::mosec.
  • Requires Python 3.7+.
  • Building from source requires Rust.
  • See examples for detailed usage: https://mosecorg.github.io/mosec/

Highlighted Details

  • Dynamic batching with configurable max_batch_size and max_wait_time.
  • Supports multiple serialization formats (JSON, Msgpack) and custom mixins.
  • Enables multi-stage pipelines for complex workflows.
  • Includes Prometheus metrics for monitoring service health and performance.
  • Offers GPU offloading and customized GPU allocation.

Maintenance & Community

  • Active development with contributions from multiple authors.
  • Community support available via Discord.
  • Used by companies like TencentCloud, Modelz, and TensorChord.

Licensing & Compatibility

  • The license is not explicitly stated in the README.

Limitations & Caveats

  • The README does not specify the license, which could be a blocker for commercial adoption.
  • For multi-stage services, passing extremely large data between stages via default serialization might slow down the pipeline.
Health Check
Last Commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
5
Issues (30d)
2
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Carol Willing Carol Willing(Core Contributor to CPython, Jupyter), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
9 more.

dynamo by ai-dynamo

1.0%
5k
Inference framework for distributed generative AI model serving
Created 6 months ago
Updated 15 hours ago
Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
3 more.

serve by pytorch

0.1%
4k
Serve, optimize, and scale PyTorch models in production
Created 6 years ago
Updated 1 month ago
Feedback? Help us improve.