multi-model-server by awslabs

CLI tool for serving deep learning models from any ML/DL framework

Created 8 years ago

1,025 stars

Top 36.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Simon Mo

Core Maintainer of vLLM

Luca Antiga

CTO of Lightning AI

Project Summary

Multi Model Server (MMS) is a tool for serving deep learning models trained with any framework, providing HTTP endpoints for inference requests. It targets ML engineers and researchers needing a flexible, easy-to-use inference server, simplifying deployment and scaling.

How It Works

MMS utilizes a worker-based architecture, with each worker handling model inference. It supports automatic scaling of workers based on available CPU or GPU resources. Models are packaged into .mar archives, which contain the model artifacts and inference logic, allowing for easy distribution and deployment.

Quick Start & Requirements

Install: pip install multi-model-server
Prerequisites: Ubuntu, CentOS, or macOS; Python; pip; Java 8. MXNet (CPU: mxnet-mkl, GPU: mxnet-cu92mkl) must be installed separately.
Example: multi-model-server --start --models squeezenet=https://s3.amazonaws.com/model-server/model_archive_1.0/squeezenet_v1.1.mar
Docs: https://github.com/awslabs/multi-model-server/tree/master/docs

Highlighted Details

Supports models from any ML/DL framework.
Automatic scaling of workers to match CPU/GPU resources.
Model packaging into .mar archives for easy deployment.
Includes Dockerfiles for production deployments.

Maintenance & Community

Join Slack channel for community interaction.
Contributions via GitHub issues and pull requests are welcome.

Licensing & Compatibility

License: Apache License 2.0.
Compatibility: Suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

MMS does not provide built-in authentication, throttling, or SSL, requiring external solutions for production security. Default network access is restricted to localhost. Windows support is experimental.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days