llm-benchmark  by lework

Benchmark LLM concurrency and stress test performance

Created 1 year ago
251 stars

Top 99.8% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

This project offers an LLM concurrent performance testing tool designed for automated stress testing and detailed performance report generation. It is targeted at engineers and researchers evaluating LLM deployment performance under varying loads, providing insights into throughput, latency, and stability. The tool helps in understanding how an LLM service scales and performs under pressure.

How It Works

The tool employs a multi-stage concurrency testing approach, systematically increasing load from low to high (1-300 concurrent requests) to identify performance bottlenecks. It automates data collection, analysis, and the generation of comprehensive statistical reports, supporting both short and long text scenarios. The core logic in llm_benchmark.py manages request handling, connection pooling, and detailed metric collection, including support for streaming responses.

Quick Start & Requirements

  • Installation: Install dependencies via pip install -r requirements.txt. Alternatively, use Docker by building the image (docker build -t llm-benchmark .) or pulling a pre-built one (docker pull samge/llm-benchmark).
  • Prerequisites: Python environment, requirements.txt dependencies. Docker is optional but recommended for ease of use.
  • Running:
    • Full suite: python run_benchmarks.py --llm_url <URL> --api_key <KEY> --model <MODEL_NAME> [--use_long_context]
    • Single test: python llm_benchmark.py --llm_url <URL> --api_key <KEY> --model <MODEL_NAME> --num_requests <N> --concurrency <C>
    • Docker commands are also provided for both scenarios, mapping an output directory ($PWD/output:/app/output).
  • Links: No specific documentation or demo links are provided in the README.

Highlighted Details

  • Automated multi-stage concurrency testing, scaling from 1 to 300 concurrent requests.
  • Comprehensive performance metrics with statistical analysis and visualized reports.
  • Support for both short and long context text scenarios.
  • Flexible configuration options and JSON output for further analysis.

Maintenance & Community

The provided README does not contain information regarding maintainers, community channels (like Discord/Slack), or project roadmaps.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license is permissive and generally compatible with commercial use and closed-source projects.

Limitations & Caveats

The tool requires a running LLM endpoint to test against, specified via --llm_url. Specific model names and API keys may be necessary depending on the LLM service. The README does not detail specific hardware requirements beyond standard Python/Docker environments, nor does it mention known bugs or alpha/beta status.

Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
21 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.