multi-model-server  by awslabs

CLI tool for serving deep learning models from any ML/DL framework

created 7 years ago
1,012 stars

Top 37.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Multi Model Server (MMS) is a tool for serving deep learning models trained with any framework, providing HTTP endpoints for inference requests. It targets ML engineers and researchers needing a flexible, easy-to-use inference server, simplifying deployment and scaling.

How It Works

MMS utilizes a worker-based architecture, with each worker handling model inference. It supports automatic scaling of workers based on available CPU or GPU resources. Models are packaged into .mar archives, which contain the model artifacts and inference logic, allowing for easy distribution and deployment.

Quick Start & Requirements

  • Install: pip install multi-model-server
  • Prerequisites: Ubuntu, CentOS, or macOS; Python; pip; Java 8. MXNet (CPU: mxnet-mkl, GPU: mxnet-cu92mkl) must be installed separately.
  • Example: multi-model-server --start --models squeezenet=https://s3.amazonaws.com/model-server/model_archive_1.0/squeezenet_v1.1.mar
  • Docs: https://github.com/awslabs/multi-model-server/tree/master/docs

Highlighted Details

  • Supports models from any ML/DL framework.
  • Automatic scaling of workers to match CPU/GPU resources.
  • Model packaging into .mar archives for easy deployment.
  • Includes Dockerfiles for production deployments.

Maintenance & Community

  • Join Slack channel for community interaction.
  • Contributions via GitHub issues and pull requests are welcome.

Licensing & Compatibility

  • License: Apache License 2.0.
  • Compatibility: Suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

MMS does not provide built-in authentication, throttling, or SSL, requiring external solutions for production security. Default network access is restricted to localhost. Windows support is experimental.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
8 more.

higgsfield by higgsfield-ai

0.3%
3k
ML framework for large model training and GPU orchestration
created 7 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Tim J. Baek Tim J. Baek(Founder of Open WebUI), and
2 more.

llmware by llmware-ai

0.2%
14k
Framework for enterprise RAG pipelines using small, specialized models
created 1 year ago
updated 1 week ago
Feedback? Help us improve.