aitune by ai-dynamo

Inference toolkit for optimizing PyTorch models on NVIDIA GPUs

Created 2 months ago

260 stars

Top 97.4% on SourcePulse

Project Summary

Summary

NVIDIA AITune is an inference toolkit designed for tuning and deploying deep learning models, optimized for NVIDIA GPUs. It addresses the need for significantly improved inference speed and efficiency across diverse AI workloads by automating the compilation and conversion of PyTorch models and pipelines. Using a unified Python API, it enables seamless tuning with backends like TensorRT, Torch-TensorRT, TorchAO, and Torch Inductor, preparing models for production with minimal code changes.

How It Works

AITune optimizes PyTorch models at the nn.Module level via two modes: Ahead-of-Time (AOT) for greater control and Just-in-Time (JIT) for zero code modification. AOT requires explicit code wrapping, while JIT can be enabled via an environment variable or import for on-the-fly tuning. The system supports multiple backends (TensorRT, Torch-TensorRT, TorchAO, Torch Inductor) and employs strategies like FirstWinsStrategy or HighestThroughputStrategy to automatically select optimal backend configurations based on performance metrics. It automates model export, conversion, correctness testing, and profiling.

Quick Start & Requirements

Installation: Recommended: pip install --extra-index-url https://pypi.nvidia.com aitune. Source install supported.
Prerequisites: Linux (Ubuntu 22.04+ recommended), Python 3.10+, PyTorch 2.7+ (version number may be a typo), TensorRT 10.5.0+ (for TensorRT backend), NVIDIA GPU.
Environment: NGC Containers for PyTorch suggested.
Links: Examples Catalog, Documentation, GitHub Issues.

Highlighted Details

Ease-of-use: Single-line code integration for tuning paths.
Wide Backend Support: TensorRT, Torch-TensorRT, TorchAO, Torch Inductor.
Model & Pipeline Tuning: Enhances performance for ResNET, BERT, Stable Diffusion pipelines.
JIT Tuning: Enables tuning without code mods via AUTOWRAPT_BOOTSTRAP=aitune_enable_jit_tuning or import aitune.torch.jit.enable.
NVTX Profiling: Integrates NVTX for detailed profiling (export NVTX_ENABLE=1).
Correctness Testing: Validates tuned models with provided data.

Maintenance & Community

Status: First release; API subject to change.
Community: Primarily via GitHub Issues. Links to Contributing and Changelog available.

Licensing & Compatibility

License: Not detailed in the provided README.
Compatibility: Targets PyTorch models on NVIDIA GPU hardware.

Limitations & Caveats

First Release: API is unstable.
JIT Mode: Lacks caching, requires re-tuning on each start, and cannot perform direct benchmarking due to unknown batch sizes, limiting strategy support.
Backend Specifics: Not all backends are universally compatible with all models due to underlying compilation technology constraints.
PyTorch Version Ambiguity: README specifies "PyTorch : Version 2.7 or newer," potentially a typo.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

249 stars in the last 30 days