TensorRT by pytorch

PyTorch compiler for NVIDIA GPUs using TensorRT

Created 5 years ago

2,921 stars

Top 16.2% on SourcePulse

View on GitHub

4 Experts Love This Project

Luca Antiga

CTO of Lightning AI

Luis Capelo

Cofounder of Lightning AI

Benjamin Bolte

Cofounder of K-Scale Labs

Soumith Chintala

Coauthor of PyTorch

Project Summary

Torch-TensorRT enables significant inference acceleration for PyTorch models on NVIDIA GPUs. It targets PyTorch users, researchers, and developers seeking to optimize model performance, offering up to 5x latency reduction with minimal code changes.

How It Works

Torch-TensorRT integrates with PyTorch's compilation pipeline (torch.compile) and export workflows. It leverages TensorRT, NVIDIA's SDK for high-performance deep learning inference, to optimize PyTorch models. The compiler converts PyTorch operations into TensorRT engines, fusing layers and applying hardware-specific optimizations for NVIDIA GPUs. This approach allows for seamless integration and substantial performance gains without requiring manual model rewriting.

Quick Start & Requirements

Install: pip install torch-tensorrt (stable) or pip install --pre torch-tensorrt --index-url https://download.pytorch.org/whl/nightly/cu128 (nightly). Also available in NVIDIA NGC PyTorch Container.
Prerequisites: PyTorch (tested with 2.8 nightly), CUDA 12.8, TensorRT 10.9.0.43.
Platform Support: Linux AMD64/GPU, Windows/GPU (Dynamo only), Linux aarch64/GPU (JetPack-4.4+). ppc64le/GPU not supported.
Docs: https://nvidia.github.io/Torch-TensorRT/

Highlighted Details

Achieve up to 5x faster inference latency compared to eager execution.
Supports torch.compile backend for one-line optimization.
Provides an export workflow for C++ deployment via TorchScript.
Includes tools for resolving graph breaks and boosting performance.

Maintenance & Community

The project is actively maintained by NVIDIA. Details on contributing can be found in CONTRIBUTING.md.

Licensing & Compatibility

Licensed under BSD-3-Clause, permitting commercial use and integration with closed-source applications.

Limitations & Caveats

Windows support is limited to the Dynamo backend. Older ARM64 platforms (JetPack < 4.4) require version 1.0.0. Support for specific TensorRT and PyTorch versions is tested, and compatibility with other versions is not guaranteed.

Health Check

Last Commit

1 day ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

23 stars in the last 30 days