model_analyzer  by triton-inference-server

CLI tool for Triton Inference Server model optimization

created 4 years ago
482 stars

Top 64.7% on sourcepulse

GitHubView on GitHub
Project Summary

Triton Model Analyzer is a CLI tool designed to help users optimize the configuration of models running on the Triton Inference Server. It assists in understanding compute and memory requirements, targeting users who deploy and manage models on Triton, and aims to improve inference performance and resource utilization.

How It Works

The tool offers several search modes to explore configuration spaces, including Optuna (alpha) for hyperparameter optimization, Quick Search for heuristic exploration of batch size and dynamic batching, Automatic Brute Search for exhaustive parameter testing, and Manual Brute Search for custom sweeps. It supports various model types such as single, ensemble, BLS, multi-model, and LLMs, generating detailed reports on configuration trade-offs and QoS constraints.

Quick Start & Requirements

  • Installation: Typically via pip or by checking out a specific release branch (e.g., r24.12 for v1.47.0).
  • Prerequisites: Triton Inference Server, Python. Specific hardware requirements depend on the models being analyzed.
  • Documentation: Model Analyzer CLI, Examples and Tutorials.

Highlighted Details

  • Supports Optuna search for hyperparameter optimization (alpha).
  • Enables profiling for ensemble, BLS, multi-model, and LLM configurations.
  • Generates detailed and summary reports with QoS constraint filtering.
  • Offers multiple search strategies: heuristic, automatic brute-force, and manual sweeps.

Maintenance & Community

The project is part of the triton-inference-server organization. Users are encouraged to report problems and ask questions via GitHub issues.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

Model Analyzer support is deprecated and will be excluded from Triton Inference Server starting with version 25.05. The Optuna search mode is an alpha release.

Health Check
Last commit

15 hours ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Logan Kilpatrick Logan Kilpatrick(Product Lead on Google AI Studio), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
3 more.

catalyst by catalyst-team

0%
3k
PyTorch framework for accelerated deep learning R&D
created 7 years ago
updated 1 month ago
Feedback? Help us improve.