aiconfigurator  by ai-dynamo

LLM serving configuration optimization

Created 8 months ago
255 stars

Top 98.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

aiconfigurator addresses the complexity of configuring disaggregated LLM serving deployments by optimizing throughput under specific latency Service Level Agreements (SLAs) like Time to First Token (TTFT) and Time per Output Token (TPOT). It targets engineers and researchers needing to fine-tune LLM serving infrastructure, providing a strong starting configuration to maximize performance and efficiency.

How It Works

The tool models LLM inference by breaking it down into fundamental operations (e.g., GEMM, attention, communication) and collecting their execution times on target hardware. It then estimates end-to-end inference times by composing these operation costs using interpolation and extrapolation. aiconfigurator evaluates thousands of potential configurations, considering both aggregated and disaggregated serving paradigms, to identify optimal deployment settings and generate configuration files for frameworks like Dynamo.

Quick Start & Requirements

Installation is straightforward via PyPI: pip3 install aiconfigurator. Building from source requires Python 3.9+ and Git LFS. Webapp support can be added with pip3 install .[webapp]. The tool supports various GPU systems (e.g., H100, H200, B200, GB200) and inference backends (TRTLLM, vLLM, SGLang), with specific versions detailed in the support matrix. Links to the project's paper and CLI user guide are available.

Highlighted Details

  • Offers both CLI and web application interfaces for configuration optimization.
  • Supports multiple inference back
Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
132
Issues (30d)
8
Star History
47 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
9 more.

LightLLM by ModelTC

0.1%
4k
Python framework for LLM inference and serving
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.