LLM inference provider benchmark using LLMPerf
Top 66.4% on sourcepulse
This repository provides a leaderboard for Large Language Model (LLM) inference providers, benchmarking their performance and reliability. It targets developers and users seeking to understand and compare the throughput and latency of various LLM providers, aiding in deployment decisions.
How It Works
The project benchmarks LLM inference providers using the LLMPerf framework. It measures output tokens throughput (tokens/second) and Time to First Token (TTFT) in seconds. Benchmarks are conducted with specific configurations: 150 total requests, 5 concurrent requests, 550 mean input tokens, and 150 mean output tokens. This approach offers a standardized comparison across providers.
Quick Start & Requirements
python token_benchmark_ray.py
from the LLMPerf repository, specifying model and provider details.raw_data
folder.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The results are subject to potential biases, including variations in provider backend implementations, time of day, client location affecting TTFT measurements, and existing system load or provider traffic. The measurements are a proxy and may not perfectly correlate with all user workloads.
1 year ago
Inactive