Vidur is a high-fidelity LLM inference system simulator designed for researchers and engineers. It enables detailed performance analysis, capacity planning, and rapid prototyping of new scheduling algorithms and optimizations without requiring direct GPU access for most testing.
How It Works
Vidur simulates LLM inference by modeling request arrival, scheduling, execution, and resource utilization. It supports various workload traces and synthetic request generation, allowing users to evaluate metrics like Time To First Token (TTFT) and Total Request Time. The simulator's extensibility allows for the integration of novel scheduling algorithms and optimization techniques, such as speculative decoding, offering a flexible platform for system-level LLM research.
Quick Start & Requirements
mamba env create -p ./env -f ./environment.yml
or a venv environment with python -m pip install -r requirements.txt
.wandb
integration for logging.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The simulator's accuracy is dependent on the fidelity of its execution time predictor, which may require initial profiling on target hardware. Support for specific hardware configurations (e.g., H100, 8xA40) is not universal across all models.
1 week ago
Inactive