guidellm  by vllm-project

LLM evaluation platform for real-world inference

Created 1 year ago
577 stars

Top 56.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

GuideLLM is a platform for evaluating and optimizing Large Language Model (LLM) deployments for real-world inference. It targets engineers and researchers needing to assess performance, resource utilization, and costs across various hardware configurations, enabling efficient and scalable LLM serving.

How It Works

GuideLLM simulates real-world inference workloads by connecting to an OpenAI-compatible inference server (like vLLM). It then runs benchmarks under different load scenarios, varying request rates and data configurations. The platform collects detailed metrics on throughput, latency, and token generation, allowing users to identify bottlenecks and determine optimal deployment strategies to meet specific service level objectives (SLOs).

Quick Start & Requirements

  • Install: pip install guidellm or pip install git+https://github.com/neuralmagic/guidellm.git
  • Prerequisites: Linux or macOS, Python 3.9-3.13. Requires an OpenAI-compatible inference server (vLLM recommended).
  • Setup: Requires starting an inference server (e.g., vLLM serving a model) before running benchmarks.
  • Docs: Installation Guide, Quick Start, Supported Backends

Highlighted Details

  • Supports various benchmark rate types: synchronous, throughput, concurrent, constant, poisson, and sweep.
  • Allows benchmarking with synthetic data (configurable prompt/output tokens) or custom datasets (HuggingFace, CSV, JSONL).
  • Generates detailed results in JSON, YAML, or CSV formats for in-depth analysis and SLO validation.
  • Integrates with Hugging Face datasets and model processors for flexible data handling.

Maintenance & Community

  • Developed by Neural Magic, Inc.
  • Contribution guidelines and code of conduct are provided.
  • GitHub Releases available for tracking updates.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool requires an external inference server to be running and configured, adding an initial setup step. While vLLM is recommended, compatibility with other OpenAI-compatible servers may vary.

Health Check
Last Commit

21 hours ago

Responsiveness

1 day

Pull Requests (30d)
37
Issues (30d)
12
Star History
84 stars in the last 30 days

Explore Similar Projects

Starred by Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
11 more.

s1 by simplescaling

0.1%
7k
Test-time scaling recipe for strong reasoning performance
Created 7 months ago
Updated 2 months ago
Feedback? Help us improve.