CLI tool for serving deep learning models from any ML/DL framework
Top 37.6% on sourcepulse
Multi Model Server (MMS) is a tool for serving deep learning models trained with any framework, providing HTTP endpoints for inference requests. It targets ML engineers and researchers needing a flexible, easy-to-use inference server, simplifying deployment and scaling.
How It Works
MMS utilizes a worker-based architecture, with each worker handling model inference. It supports automatic scaling of workers based on available CPU or GPU resources. Models are packaged into .mar
archives, which contain the model artifacts and inference logic, allowing for easy distribution and deployment.
Quick Start & Requirements
pip install multi-model-server
mxnet-mkl
, GPU: mxnet-cu92mkl
) must be installed separately.multi-model-server --start --models squeezenet=https://s3.amazonaws.com/model-server/model_archive_1.0/squeezenet_v1.1.mar
Highlighted Details
.mar
archives for easy deployment.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
MMS does not provide built-in authentication, throttling, or SSL, requiring external solutions for production security. Default network access is restricted to localhost. Windows support is experimental.
1 year ago
1 day