serve  by pytorch

Serve, optimize, and scale PyTorch models in production

created 5 years ago
4,341 stars

Top 11.5% on sourcepulse

GitHubView on GitHub
Project Summary

TorchServe is a flexible and easy-to-use tool for serving and scaling PyTorch models in production, targeting ML engineers and researchers. It simplifies deployment across various environments, including on-premise, cloud, and Kubernetes, offering features for model management, inference APIs, and performance optimization.

How It Works

TorchServe acts as a dedicated model server, abstracting away the complexities of deploying PyTorch models. It supports REST and gRPC for inference, handles model versioning, and allows for complex workflows using DAGs. Its architecture is designed for scalability and efficiency, integrating with various hardware accelerators and optimization frameworks like TensorRT and ONNX.

Quick Start & Requirements

  • Installation: pip install torchserve or conda install -c pytorch torchserve. Docker images are also available (pytorch/torchserve).
  • Prerequisites: Python >= 3.8. Accelerator support requires specific flags during dependency installation (e.g., --cuda=cu121, --rocm=rocm61). LLM deployment may require huggingface-cli login.
  • Resources: LLM deployment examples suggest significant memory (--shm-size 10g) and GPU resources.
  • Documentation: Getting started guide, Docker details, LLM deployment.

Highlighted Details

  • Supports serving PyTorch models on AWS SageMaker and Google Vertex AI.
  • Integrates with Kubernetes, KServe, and Kubeflow for scalable deployments.
  • Offers out-of-the-box support for performance optimization, benchmarking, and profiling.
  • Features include Torchscript, PyTorch Compiler (preview), ONNX, TensorRT, and FlashAttention integration.

Maintenance & Community

This project is no longer actively maintained, with no planned updates, bug fixes, or security patches. It is jointly operated by Amazon and Meta.

Licensing & Compatibility

Apache 2.0 License. Compatible with commercial and closed-source applications.

Limitations & Caveats

The project is in limited maintenance, meaning no future updates or security patches will be provided. Users should be aware of potential unaddressed vulnerabilities. Security features like token authorization are now enabled by default.

Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
6
Issues (30d)
0
Star History
41 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
2 more.

gpustack by gpustack

1.6%
3k
GPU cluster manager for AI model deployment
created 1 year ago
updated 2 days ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
5 more.

TensorRT-LLM by NVIDIA

0.6%
11k
LLM inference optimization SDK for NVIDIA GPUs
created 1 year ago
updated 13 hours ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Nat Friedman Nat Friedman(Former CEO of GitHub), and
32 more.

llama.cpp by ggml-org

0.4%
84k
C/C++ library for local LLM inference
created 2 years ago
updated 9 hours ago
Feedback? Help us improve.