Discover and explore top open-source AI tools and projects—updated daily.
ai-dynamoLLM serving configuration optimization
Top 98.8% on SourcePulse
Summary
aiconfigurator addresses the complexity of configuring disaggregated LLM serving deployments by optimizing throughput under specific latency Service Level Agreements (SLAs) like Time to First Token (TTFT) and Time per Output Token (TPOT). It targets engineers and researchers needing to fine-tune LLM serving infrastructure, providing a strong starting configuration to maximize performance and efficiency.
How It Works
The tool models LLM inference by breaking it down into fundamental operations (e.g., GEMM, attention, communication) and collecting their execution times on target hardware. It then estimates end-to-end inference times by composing these operation costs using interpolation and extrapolation. aiconfigurator evaluates thousands of potential configurations, considering both aggregated and disaggregated serving paradigms, to identify optimal deployment settings and generate configuration files for frameworks like Dynamo.
Quick Start & Requirements
Installation is straightforward via PyPI: pip3 install aiconfigurator. Building from source requires Python 3.9+ and Git LFS. Webapp support can be added with pip3 install .[webapp]. The tool supports various GPU systems (e.g., H100, H200, B200, GB200) and inference backends (TRTLLM, vLLM, SGLang), with specific versions detailed in the support matrix. Links to the project's paper and CLI user guide are available.
Highlighted Details
1 day ago
Inactive
modal-labs
SqueezeAILab
b4rtaz
ModelTC