guidellm  by vllm-project

LLM evaluation platform for real-world inference

created 1 year ago
453 stars

Top 67.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

GuideLLM is a platform for evaluating and optimizing Large Language Model (LLM) deployments for real-world inference. It targets engineers and researchers needing to assess performance, resource utilization, and costs across various hardware configurations, enabling efficient and scalable LLM serving.

How It Works

GuideLLM simulates real-world inference workloads by connecting to an OpenAI-compatible inference server (like vLLM). It then runs benchmarks under different load scenarios, varying request rates and data configurations. The platform collects detailed metrics on throughput, latency, and token generation, allowing users to identify bottlenecks and determine optimal deployment strategies to meet specific service level objectives (SLOs).

Quick Start & Requirements

  • Install: pip install guidellm or pip install git+https://github.com/neuralmagic/guidellm.git
  • Prerequisites: Linux or macOS, Python 3.9-3.13. Requires an OpenAI-compatible inference server (vLLM recommended).
  • Setup: Requires starting an inference server (e.g., vLLM serving a model) before running benchmarks.
  • Docs: Installation Guide, Quick Start, Supported Backends

Highlighted Details

  • Supports various benchmark rate types: synchronous, throughput, concurrent, constant, poisson, and sweep.
  • Allows benchmarking with synthetic data (configurable prompt/output tokens) or custom datasets (HuggingFace, CSV, JSONL).
  • Generates detailed results in JSON, YAML, or CSV formats for in-depth analysis and SLO validation.
  • Integrates with Hugging Face datasets and model processors for flexible data handling.

Maintenance & Community

  • Developed by Neural Magic, Inc.
  • Contribution guidelines and code of conduct are provided.
  • GitHub Releases available for tracking updates.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Permissive license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

The tool requires an external inference server to be running and configured, adding an initial setup step. While vLLM is recommended, compatibility with other OpenAI-compatible servers may vary.

Health Check
Last commit

21 hours ago

Responsiveness

1 week

Pull Requests (30d)
29
Issues (30d)
14
Star History
178 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Tobi Lutke Tobi Lutke(Cofounder of Shopify), and
27 more.

vllm by vllm-project

1.0%
54k
LLM serving engine for high-throughput, memory-efficient inference
created 2 years ago
updated 6 hours ago
Feedback? Help us improve.