aitune  by ai-dynamo

Inference toolkit for optimizing PyTorch models on NVIDIA GPUs

Created 3 months ago
274 stars

Top 94.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

NVIDIA AITune is an inference toolkit designed for tuning and deploying deep learning models, optimized for NVIDIA GPUs. It addresses the need for significantly improved inference speed and efficiency across diverse AI workloads by automating the compilation and conversion of PyTorch models and pipelines. Using a unified Python API, it enables seamless tuning with backends like TensorRT, Torch-TensorRT, TorchAO, and Torch Inductor, preparing models for production with minimal code changes.

How It Works

AITune optimizes PyTorch models at the nn.Module level via two modes: Ahead-of-Time (AOT) for greater control and Just-in-Time (JIT) for zero code modification. AOT requires explicit code wrapping, while JIT can be enabled via an environment variable or import for on-the-fly tuning. The system supports multiple backends (TensorRT, Torch-TensorRT, TorchAO, Torch Inductor) and employs strategies like FirstWinsStrategy or HighestThroughputStrategy to automatically select optimal backend configurations based on performance metrics. It automates model export, conversion, correctness testing, and profiling.

Quick Start & Requirements

  • Installation: Recommended: pip install --extra-index-url https://pypi.nvidia.com aitune. Source install supported.
  • Prerequisites: Linux (Ubuntu 22.04+ recommended), Python 3.10+, PyTorch 2.7+ (version number may be a typo), TensorRT 10.5.0+ (for TensorRT backend), NVIDIA GPU.
  • Environment: NGC Containers for PyTorch suggested.
  • Links: Examples Catalog, Documentation, GitHub Issues.

Highlighted Details

  • Ease-of-use: Single-line code integration for tuning paths.
  • Wide Backend Support: TensorRT, Torch-TensorRT, TorchAO, Torch Inductor.
  • Model & Pipeline Tuning: Enhances performance for ResNET, BERT, Stable Diffusion pipelines.
  • JIT Tuning: Enables tuning without code mods via AUTOWRAPT_BOOTSTRAP=aitune_enable_jit_tuning or import aitune.torch.jit.enable.
  • NVTX Profiling: Integrates NVTX for detailed profiling (export NVTX_ENABLE=1).
  • Correctness Testing: Validates tuned models with provided data.

Maintenance & Community

  • Status: First release; API subject to change.
  • Community: Primarily via GitHub Issues. Links to Contributing and Changelog available.

Licensing & Compatibility

  • License: Not detailed in the provided README.
  • Compatibility: Targets PyTorch models on NVIDIA GPU hardware.

Limitations & Caveats

  • First Release: API is unstable.
  • JIT Mode: Lacks caching, requires re-tuning on each start, and cannot perform direct benchmarking due to unknown batch sizes, limiting strategy support.
  • Backend Specifics: Not all backends are universally compatible with all models due to underlying compilation technology constraints.
  • PyTorch Version Ambiguity: README specifies "PyTorch : Version 2.7 or newer," potentially a typo.
Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
5 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.