aitune  by ai-dynamo

Inference toolkit for optimizing PyTorch models on NVIDIA GPUs

Created 2 months ago
260 stars

Top 97.4% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

NVIDIA AITune is an inference toolkit designed for tuning and deploying deep learning models, optimized for NVIDIA GPUs. It addresses the need for significantly improved inference speed and efficiency across diverse AI workloads by automating the compilation and conversion of PyTorch models and pipelines. Using a unified Python API, it enables seamless tuning with backends like TensorRT, Torch-TensorRT, TorchAO, and Torch Inductor, preparing models for production with minimal code changes.

How It Works

AITune optimizes PyTorch models at the nn.Module level via two modes: Ahead-of-Time (AOT) for greater control and Just-in-Time (JIT) for zero code modification. AOT requires explicit code wrapping, while JIT can be enabled via an environment variable or import for on-the-fly tuning. The system supports multiple backends (TensorRT, Torch-TensorRT, TorchAO, Torch Inductor) and employs strategies like FirstWinsStrategy or HighestThroughputStrategy to automatically select optimal backend configurations based on performance metrics. It automates model export, conversion, correctness testing, and profiling.

Quick Start & Requirements

  • Installation: Recommended: pip install --extra-index-url https://pypi.nvidia.com aitune. Source install supported.
  • Prerequisites: Linux (Ubuntu 22.04+ recommended), Python 3.10+, PyTorch 2.7+ (version number may be a typo), TensorRT 10.5.0+ (for TensorRT backend), NVIDIA GPU.
  • Environment: NGC Containers for PyTorch suggested.
  • Links: Examples Catalog, Documentation, GitHub Issues.

Highlighted Details

  • Ease-of-use: Single-line code integration for tuning paths.
  • Wide Backend Support: TensorRT, Torch-TensorRT, TorchAO, Torch Inductor.
  • Model & Pipeline Tuning: Enhances performance for ResNET, BERT, Stable Diffusion pipelines.
  • JIT Tuning: Enables tuning without code mods via AUTOWRAPT_BOOTSTRAP=aitune_enable_jit_tuning or import aitune.torch.jit.enable.
  • NVTX Profiling: Integrates NVTX for detailed profiling (export NVTX_ENABLE=1).
  • Correctness Testing: Validates tuned models with provided data.

Maintenance & Community

  • Status: First release; API subject to change.
  • Community: Primarily via GitHub Issues. Links to Contributing and Changelog available.

Licensing & Compatibility

  • License: Not detailed in the provided README.
  • Compatibility: Targets PyTorch models on NVIDIA GPU hardware.

Limitations & Caveats

  • First Release: API is unstable.
  • JIT Mode: Lacks caching, requires re-tuning on each start, and cannot perform direct benchmarking due to unknown batch sizes, limiting strategy support.
  • Backend Specifics: Not all backends are universally compatible with all models due to underlying compilation technology constraints.
  • PyTorch Version Ambiguity: README specifies "PyTorch : Version 2.7 or newer," potentially a typo.
Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
249 stars in the last 30 days

Explore Similar Projects

Starred by George Hotz George Hotz(Author of tinygrad; Founder of the tiny corp, comma.ai) and Carol Willing Carol Willing(Core Contributor to CPython, Jupyter).

ai-performance-engineering by cfregly

1.2%
1k
AI Systems Performance Engineering for modern AI workloads
Created 1 year ago
Updated 4 weeks ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.2%
4k
AI inference pipeline framework
Created 2 years ago
Updated 2 weeks ago
Feedback? Help us improve.