llmperf  by ray-project

LLM validation/benchmark library for LLM APIs

created 1 year ago
971 stars

Top 38.8% on sourcepulse

GitHubView on GitHub
Project Summary

LLMPerf is a library for evaluating the performance and correctness of Large Language Model (LLM) APIs. It is designed for researchers and engineers who need to benchmark different LLM providers and models under various load conditions. The tool helps quantify inter-token latency, generation throughput, and response accuracy.

How It Works

LLMPerf utilizes Ray for distributed execution, enabling it to simulate concurrent requests to LLM APIs. It offers two primary test types: a load test measuring latency and throughput, and a correctness test verifying response accuracy against specific prompts. Token counting is standardized using LlamaTokenizer for consistent comparisons across different LLM backends.

Quick Start & Requirements

  • Install: git clone https://github.com/ray-project/llmperf.git && cd llmperf && pip install -e .
  • Prerequisites: Python, Ray, Transformers (LlamaTokenizerFast). API keys and endpoint configurations are required for specific LLM providers.
  • Documentation: LLMPerf README

Highlighted Details

  • Supports benchmarking of OpenAI-compatible APIs, Anthropic, TogetherAI, Hugging Face, Vertex AI, and SageMaker endpoints.
  • Integrates with LiteLLM for broad LLM provider compatibility.
  • Load tests measure inter-token latency and throughput using Shakespearean sonnet prompts.
  • Correctness tests validate specific prompt-response patterns, like number conversion.

Maintenance & Community

  • Developed by the Ray Project.
  • Legacy codebase available at llmperf-legacy.

Licensing & Compatibility

  • License: Apache License 2.0.
  • Compatibility: Suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

Performance results are sensitive to backend implementation, network conditions, and time of day, and may not directly correlate with all user workloads. Vertex AI and SageMaker do not return token counts, necessitating tokenization via LlamaTokenizer for these services.

Health Check
Last commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
94 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.