guidellm by vllm-project

LLM evaluation platform for real-world inference

Created 1 year ago

798 stars

Top 44.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Philipp Schmid

DevRel at Google DeepMind

Project Summary

GuideLLM is a platform for evaluating and optimizing Large Language Model (LLM) deployments for real-world inference. It targets engineers and researchers needing to assess performance, resource utilization, and costs across various hardware configurations, enabling efficient and scalable LLM serving.

How It Works

GuideLLM simulates real-world inference workloads by connecting to an OpenAI-compatible inference server (like vLLM). It then runs benchmarks under different load scenarios, varying request rates and data configurations. The platform collects detailed metrics on throughput, latency, and token generation, allowing users to identify bottlenecks and determine optimal deployment strategies to meet specific service level objectives (SLOs).

Quick Start & Requirements

Install: pip install guidellm or pip install git+https://github.com/neuralmagic/guidellm.git
Prerequisites: Linux or macOS, Python 3.9-3.13. Requires an OpenAI-compatible inference server (vLLM recommended).
Setup: Requires starting an inference server (e.g., vLLM serving a model) before running benchmarks.
Docs: Installation Guide, Quick Start, Supported Backends

Highlighted Details

Supports various benchmark rate types: synchronous, throughput, concurrent, constant, poisson, and sweep.
Allows benchmarking with synthetic data (configurable prompt/output tokens) or custom datasets (HuggingFace, CSV, JSONL).
Generates detailed results in JSON, YAML, or CSV formats for in-depth analysis and SLO validation.
Integrates with Hugging Face datasets and model processors for flexible data handling.

Maintenance & Community

Developed by Neural Magic, Inc.
Contribution guidelines and code of conduct are provided.
GitHub Releases available for tracking updates.

Licensing & Compatibility

Licensed under the Apache License 2.0.
Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool requires an external inference server to be running and configured, adding an initial setup step. While vLLM is recommended, compatibility with other OpenAI-compatible servers may vary.

Health Check

Last Commit

16 hours ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

50 stars in the last 30 days