Serve, optimize, and scale PyTorch models in production
Top 11.5% on sourcepulse
TorchServe is a flexible and easy-to-use tool for serving and scaling PyTorch models in production, targeting ML engineers and researchers. It simplifies deployment across various environments, including on-premise, cloud, and Kubernetes, offering features for model management, inference APIs, and performance optimization.
How It Works
TorchServe acts as a dedicated model server, abstracting away the complexities of deploying PyTorch models. It supports REST and gRPC for inference, handles model versioning, and allows for complex workflows using DAGs. Its architecture is designed for scalability and efficiency, integrating with various hardware accelerators and optimization frameworks like TensorRT and ONNX.
Quick Start & Requirements
pip install torchserve
or conda install -c pytorch torchserve
. Docker images are also available (pytorch/torchserve
).--cuda=cu121
, --rocm=rocm61
). LLM deployment may require huggingface-cli login
.--shm-size 10g
) and GPU resources.Highlighted Details
Maintenance & Community
This project is no longer actively maintained, with no planned updates, bug fixes, or security patches. It is jointly operated by Amazon and Meta.
Licensing & Compatibility
Apache 2.0 License. Compatible with commercial and closed-source applications.
Limitations & Caveats
The project is in limited maintenance, meaning no future updates or security patches will be provided. Users should be aware of potential unaddressed vulnerabilities. Security features like token authorization are now enabled by default.
3 weeks ago
1 day