serve by pytorch

Serve, optimize, and scale PyTorch models in production

Created 6 years ago

4,356 stars

Top 11.2% on SourcePulse

View on GitHub

5 Experts Love This Project

Jeff Hammerbacher

Cofounder of Cloudera

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Anastasis Germanidis

Cofounder of Runway

Anastasios Angelopoulos

Cofounder of LMArena

and 1 more!

Project Summary

TorchServe is a flexible and easy-to-use tool for serving and scaling PyTorch models in production, targeting ML engineers and researchers. It simplifies deployment across various environments, including on-premise, cloud, and Kubernetes, offering features for model management, inference APIs, and performance optimization.

How It Works

TorchServe acts as a dedicated model server, abstracting away the complexities of deploying PyTorch models. It supports REST and gRPC for inference, handles model versioning, and allows for complex workflows using DAGs. Its architecture is designed for scalability and efficiency, integrating with various hardware accelerators and optimization frameworks like TensorRT and ONNX.

Quick Start & Requirements

Installation: pip install torchserve or conda install -c pytorch torchserve. Docker images are also available (pytorch/torchserve).
Prerequisites: Python >= 3.8. Accelerator support requires specific flags during dependency installation (e.g., --cuda=cu121, --rocm=rocm61). LLM deployment may require huggingface-cli login.
Resources: LLM deployment examples suggest significant memory (--shm-size 10g) and GPU resources.
Documentation: Getting started guide, Docker details, LLM deployment.

Highlighted Details

Supports serving PyTorch models on AWS SageMaker and Google Vertex AI.
Integrates with Kubernetes, KServe, and Kubeflow for scalable deployments.
Offers out-of-the-box support for performance optimization, benchmarking, and profiling.
Features include Torchscript, PyTorch Compiler (preview), ONNX, TensorRT, and FlashAttention integration.

Maintenance & Community

This project is no longer actively maintained, with no planned updates, bug fixes, or security patches. It is jointly operated by Amazon and Meta.

Licensing & Compatibility

Apache 2.0 License. Compatible with commercial and closed-source applications.

Limitations & Caveats

The project is in limited maintenance, meaning no future updates or security patches will be provided. Users should be aware of potential unaddressed vulnerabilities. Security features like token authorization are now enabled by default.

Health Check

Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days