llm-benchmark by lework

Benchmark LLM concurrency and stress test performance

Created 1 year ago

251 stars

Top 99.8% on SourcePulse

Project Summary

Summary

This project offers an LLM concurrent performance testing tool designed for automated stress testing and detailed performance report generation. It is targeted at engineers and researchers evaluating LLM deployment performance under varying loads, providing insights into throughput, latency, and stability. The tool helps in understanding how an LLM service scales and performs under pressure.

How It Works

The tool employs a multi-stage concurrency testing approach, systematically increasing load from low to high (1-300 concurrent requests) to identify performance bottlenecks. It automates data collection, analysis, and the generation of comprehensive statistical reports, supporting both short and long text scenarios. The core logic in llm_benchmark.py manages request handling, connection pooling, and detailed metric collection, including support for streaming responses.

Quick Start & Requirements

Installation: Install dependencies via pip install -r requirements.txt. Alternatively, use Docker by building the image (docker build -t llm-benchmark .) or pulling a pre-built one (docker pull samge/llm-benchmark).
Prerequisites: Python environment, requirements.txt dependencies. Docker is optional but recommended for ease of use.
Running:
- Full suite: python run_benchmarks.py --llm_url <URL> --api_key <KEY> --model <MODEL_NAME> [--use_long_context]
- Single test: python llm_benchmark.py --llm_url <URL> --api_key <KEY> --model <MODEL_NAME> --num_requests <N> --concurrency <C>
- Docker commands are also provided for both scenarios, mapping an output directory ($PWD/output:/app/output).
Links: No specific documentation or demo links are provided in the README.

Highlighted Details

Automated multi-stage concurrency testing, scaling from 1 to 300 concurrent requests.
Comprehensive performance metrics with statistical analysis and visualized reports.
Support for both short and long context text scenarios.
Flexible configuration options and JSON output for further analysis.

Maintenance & Community

The provided README does not contain information regarding maintainers, community channels (like Discord/Slack), or project roadmaps.

Licensing & Compatibility

License: MIT License.
Compatibility: The MIT license is permissive and generally compatible with commercial use and closed-source projects.

Limitations & Caveats

The tool requires a running LLM endpoint to test against, specified via --llm_url. Specific model names and API keys may be necessary depending on the LLM service. The README does not detail specific hardware requirements beyond standard Python/Docker environments, nor does it mention known bugs or alpha/beta status.

llm-benchmark by lework

Explore Similar Projects

llmperf-leaderboard by ray-project

ollama-benchmark by aidatatools

ToolCall-15 by stevibe

genai-bench by sgl-project

llama-throughput-lab by alexziskind1

LLMServingSim by casys-kaist

sarathi-serve by microsoft

LLM-Viewer by hahnyuan

vidur by microsoft

guidellm by vllm-project

llm-applications by ray-project

LiteRT-LM by google-ai-edge