llmperf-leaderboard  by ray-project

LLM inference provider benchmark using LLMPerf

created 1 year ago
463 stars

Top 66.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a leaderboard for Large Language Model (LLM) inference providers, benchmarking their performance and reliability. It targets developers and users seeking to understand and compare the throughput and latency of various LLM providers, aiding in deployment decisions.

How It Works

The project benchmarks LLM inference providers using the LLMPerf framework. It measures output tokens throughput (tokens/second) and Time to First Token (TTFT) in seconds. Benchmarks are conducted with specific configurations: 150 total requests, 5 concurrent requests, 550 mean input tokens, and 150 mean output tokens. This approach offers a standardized comparison across providers.

Quick Start & Requirements

  • Install/Run: Use the command template python token_benchmark_ray.py from the LLMPerf repository, specifying model and provider details.
  • Prerequisites: AWS EC2 instance (e.g., i4i.large) in us-west-2, Python.
  • Resources: Raw data is available in the raw_data folder.
  • Details: Benchmark configurations and results for Llama-2 7B, 13B, and 70B models are provided.

Highlighted Details

  • Benchmarks focus on output tokens throughput and Time to First Token (TTFT).
  • Evaluates providers like Anyscale, Bedrock, Fireworks, Groq, Lepton, Perplexity, Replicate, and Together.
  • Results are presented in detailed tables for different model sizes (7B, 13B, 70B).
  • Data was collected as of December 19, 2023.

Maintenance & Community

  • Feedback can be provided via a link.
  • Providers interested in being featured can submit an issue or contact via email.

Licensing & Compatibility

  • The repository's license is not explicitly stated in the README.

Limitations & Caveats

The results are subject to potential biases, including variations in provider backend implementations, time of day, client location affecting TTFT measurements, and existing system load or provider traffic. The measurements are a proxy and may not perfectly correlate with all user workloads.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.