aiconfigurator by ai-dynamo

LLM serving configuration optimization

Created 11 months ago

358 stars

Top 77.9% on SourcePulse

View on GitHub

2 Experts Love This Project

Pawel Garbacki

Cofounder of Fireworks AI

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Project Summary

Summary

aiconfigurator addresses the complexity of configuring disaggregated LLM serving deployments by optimizing throughput under specific latency Service Level Agreements (SLAs) like Time to First Token (TTFT) and Time per Output Token (TPOT). It targets engineers and researchers needing to fine-tune LLM serving infrastructure, providing a strong starting configuration to maximize performance and efficiency.

How It Works

The tool models LLM inference by breaking it down into fundamental operations (e.g., GEMM, attention, communication) and collecting their execution times on target hardware. It then estimates end-to-end inference times by composing these operation costs using interpolation and extrapolation. aiconfigurator evaluates thousands of potential configurations, considering both aggregated and disaggregated serving paradigms, to identify optimal deployment settings and generate configuration files for frameworks like Dynamo.

Quick Start & Requirements

Installation is straightforward via PyPI: pip3 install aiconfigurator. Building from source requires Python 3.9+ and Git LFS. Webapp support can be added with pip3 install .[webapp]. The tool supports various GPU systems (e.g., H100, H200, B200, GB200) and inference backends (TRTLLM, vLLM, SGLang), with specific versions detailed in the support matrix. Links to the project's paper and CLI user guide are available.

aiconfigurator by ai-dynamo

Explore Similar Projects

LLM-inference-optimization-paper by chenhongyu2048

kaiwu by val1813

FrugalGPT by stanford-futuredata

llama.cpp-deepseek-v4-flash by antirez

llama.go by gotzmann

ServerlessLLM by ServerlessLLM

llm-finetuning by modal-labs

vidur by microsoft

LLM-VM by anarchy-ai

kthena by volcano-sh

distributed-llama by b4rtaz

LightLLM by ModelTC