serve  by pytorch

Serve, optimize, and scale PyTorch models in production

Created 6 years ago
4,349 stars

Top 11.3% on SourcePulse

GitHubView on GitHub
Project Summary

TorchServe is a flexible and easy-to-use tool for serving and scaling PyTorch models in production, targeting ML engineers and researchers. It simplifies deployment across various environments, including on-premise, cloud, and Kubernetes, offering features for model management, inference APIs, and performance optimization.

How It Works

TorchServe acts as a dedicated model server, abstracting away the complexities of deploying PyTorch models. It supports REST and gRPC for inference, handles model versioning, and allows for complex workflows using DAGs. Its architecture is designed for scalability and efficiency, integrating with various hardware accelerators and optimization frameworks like TensorRT and ONNX.

Quick Start & Requirements

  • Installation: pip install torchserve or conda install -c pytorch torchserve. Docker images are also available (pytorch/torchserve).
  • Prerequisites: Python >= 3.8. Accelerator support requires specific flags during dependency installation (e.g., --cuda=cu121, --rocm=rocm61). LLM deployment may require huggingface-cli login.
  • Resources: LLM deployment examples suggest significant memory (--shm-size 10g) and GPU resources.
  • Documentation: Getting started guide, Docker details, LLM deployment.

Highlighted Details

  • Supports serving PyTorch models on AWS SageMaker and Google Vertex AI.
  • Integrates with Kubernetes, KServe, and Kubeflow for scalable deployments.
  • Offers out-of-the-box support for performance optimization, benchmarking, and profiling.
  • Features include Torchscript, PyTorch Compiler (preview), ONNX, TensorRT, and FlashAttention integration.

Maintenance & Community

This project is no longer actively maintained, with no planned updates, bug fixes, or security patches. It is jointly operated by Amazon and Meta.

Licensing & Compatibility

Apache 2.0 License. Compatible with commercial and closed-source applications.

Limitations & Caveats

The project is in limited maintenance, meaning no future updates or security patches will be provided. Users should be aware of potential unaddressed vulnerabilities. Security features like token authorization are now enabled by default.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Amanpreet Singh Amanpreet Singh(Cofounder of Contextual AI), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
7 more.

truss by basetenlabs

0.2%
1k
Model deployment tool for productionizing AI/ML models
Created 3 years ago
Updated 1 day ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.3%
4k
AI inference pipeline framework
Created 1 year ago
Updated 1 day ago
Starred by Eugene Yan Eugene Yan(AI Scientist at AWS), Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
4 more.

seldon-core by SeldonIO

0.2%
5k
MLOps framework for production model deployment on Kubernetes
Created 7 years ago
Updated 13 hours ago
Feedback? Help us improve.