model_analyzer by triton-inference-server

CLI tool for Triton Inference Server model optimization

Created 5 years ago

499 stars

Top 62.2% on SourcePulse

Project Summary

Triton Model Analyzer is a CLI tool designed to help users optimize the configuration of models running on the Triton Inference Server. It assists in understanding compute and memory requirements, targeting users who deploy and manage models on Triton, and aims to improve inference performance and resource utilization.

How It Works

The tool offers several search modes to explore configuration spaces, including Optuna (alpha) for hyperparameter optimization, Quick Search for heuristic exploration of batch size and dynamic batching, Automatic Brute Search for exhaustive parameter testing, and Manual Brute Search for custom sweeps. It supports various model types such as single, ensemble, BLS, multi-model, and LLMs, generating detailed reports on configuration trade-offs and QoS constraints.

Quick Start & Requirements

Installation: Typically via pip or by checking out a specific release branch (e.g., r24.12 for v1.47.0).
Prerequisites: Triton Inference Server, Python. Specific hardware requirements depend on the models being analyzed.
Documentation: Model Analyzer CLI, Examples and Tutorials.

Highlighted Details

Supports Optuna search for hyperparameter optimization (alpha).
Enables profiling for ensemble, BLS, multi-model, and LLM configurations.
Generates detailed and summary reports with QoS constraint filtering.
Offers multiple search strategies: heuristic, automatic brute-force, and manual sweeps.

Maintenance & Community

The project is part of the triton-inference-server organization. Users are encouraged to report problems and ask questions via GitHub issues.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking would require clarification of the licensing terms.

Limitations & Caveats

Model Analyzer support is deprecated and will be excluded from Triton Inference Server starting with version 25.05. The Optuna search mode is an alpha release.

model_analyzer by triton-inference-server

Explore Similar Projects

nyuntam by nyunAI

ollama-benchmark by aidatatools

EasyTemporalPointProcess by ant-research

yalm by andrewkchan

llm-analysis by cli99

binary-mlc-llm-libs by mlc-ai

paxml by google

LLM-Viewer by hahnyuan

xTuring by stochasticai

optillm by algorithmicsuperintelligence

Olive by microsoft

CTranslate2 by OpenNMT