TensorRT by NVIDIA

SDK for accelerated deep learning inference on NVIDIA GPUs

Created 7 years ago

13,169 stars

Top 4.0% on SourcePulse

View on GitHub

9 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Travis Fischer

Founder of Agentic

Benjamin Bolte

Cofounder of K-Scale Labs

and 5 more!

Project Summary

NVIDIA TensorRT is an SDK designed for high-performance deep learning inference on NVIDIA GPUs, providing optimized runtimes and tools. This repository hosts the open-source components of TensorRT, including plugins and parsers, enabling developers to accelerate AI inference workflows. The latest release, TensorRT 11.0, focuses on API streamlining and introduces significant changes, such as the removal of legacy features and the adoption of strongly typed networks and explicit quantization.

How It Works

TensorRT optimizes deep learning models for inference by applying techniques like layer and tensor fusion, kernel auto-tuning, and dynamic precision calibration. The open-source components facilitate model ingestion via parsers (e.g., ONNX) and allow for custom operations through a plugin system. This approach targets NVIDIA hardware to maximize throughput and minimize latency for deployed AI models.

Quick Start & Requirements

Primary Install: pip install tensorrt
Build Prerequisites:
- TensorRT GA build (v11.0.0.114 recommended)
- CUDA (versions 13.2.0 or 12.9.0 recommended)
- cuDNN (optional, v8.9 recommended)
- GNU make (>= v4.1)
- cmake (>= v3.31)
- Python (>= v3.10, <= v3.13.x)
- pip (>= v19.0)
- git, pkg-config, wget
- Optional: NCCL (>= v2.19, < v3.0) for multi-device support.
- Containerized builds require Docker (>= 19.03) and NVIDIA Container Toolkit.
Links:
- Import Workflows Guide: [See README]
- Supported Models: [See README]
- Contribution Guide: [See README]
- Changelog: [See README]

Highlighted Details

TensorRT 11.0 released with new capabilities for AI inference acceleration.
API streamlined with removal of weakly-typed networks, implicit quantization, and IPluginV2.
Introduces Strongly Typed Networks and Explicit Quantization for improved control.
Supports various import paths including ONNX, Torch-TensorRT, and HuggingFace/Optimum.

Maintenance & Community

The project provides a Contribution Guide and Coding Guidelines for code contributions. Updates are detailed in the Changelog. Community engagement is encouraged via TensorRT and Triton community channels. Enterprise support is available through NVIDIA AI Enterprise.

Licensing & Compatibility

The README does not explicitly state the license type for the open-source components. Compatibility for commercial use or closed-source linking would require clarification on the licensing terms.

Limitations & Caveats

TensorRT 11.0 removes support for weakly-typed networks, implicit quantization, and IPluginV2 APIs, requiring migration to newer paradigms. Python bindings for versions older than 3.9 have been removed, and RPM packages now depend on Python 3.12. The TREX tool has been replaced by Nsight Deep Learning Designer.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

77 stars in the last 30 days