Discover and explore top open-source AI tools and projects—updated daily.
casys-kaistUnified simulator for heterogeneous LLM serving infrastructure
Top 99.0% on SourcePulse
Summary
LLMServingSim 2.0 is a unified simulator for heterogeneous and disaggregated LLM serving infrastructure. It empowers engineers and researchers to accurately predict and analyze LLM serving performance across diverse hardware and parallelism strategies, aiding in infrastructure design and optimization.
How It Works
The simulator integrates a vLLM-based layerwise profiler capturing real CUDA kernel timings. This detailed performance data feeds a core engine that models heterogeneous and disaggregated LLM serving. Key features include skew-aware attention for heterogeneous decode batches, multi-hardware support, and per-rank Mixture-of-Experts (MoE) latency modeling with DP+EP parallelism via ASTRA-Sim. It also supports vLLM-style request routing.
Quick Start & Requirements
Installation uses Docker (scripts/docker-sim.sh, scripts/docker-vllm.sh) or a bare-metal vLLM installer (scripts/install-vllm.sh). ASTRA-Sim compilation is via ./scripts/compile.sh. Configurations (configs/cluster/) define topology, hardware, memory, and interconnects, supporting per-layer placement and PIM. Supported hardware includes RTXPRO6000, with profiling data for Llama-3.1-8B and Qwen3 variants. Datasets are JSONL (workloads/).
Highlighted Details
Maintenance & Community
The project is under active development, with the current branch noted as potentially unstable. Contributions via pull requests are welcomed. Published in CAL 2025 and IISWC 2024, indicating academic relevance.
Licensing & Compatibility
The README does not explicitly state a software license, requiring further investigation for commercial use or integration into closed-source projects.
Limitations & Caveats
The current development branch is marked as potentially unstable. The absence of a stated software license is a significant caveat for adoption, particularly in commercial contexts.
3 days ago
Inactive
LMCache