aiperf  by ai-dynamo

Benchmark generative AI model performance across diverse inference solutions

Created 1 year ago
251 stars

Top 99.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

AIPerf is a comprehensive benchmarking tool for measuring generative AI model performance across various inference solutions. It provides detailed metrics via a command-line display and extensive reports, targeting engineers and researchers needing to evaluate and optimize AI model deployments.

How It Works

It utilizes a scalable multiprocess architecture with services communicating via ZMQ. AIPerf supports diverse benchmarking modes (concurrency, request-rate, trace replay) and offers three UI options: a real-time TUI dashboard, simple progress bars, or headless execution. A key advantage is its extensibility through a plugin system for custom endpoints, datasets, transports, and metrics.

Quick Start & Requirements

  • Installation: pip install aiperf within a Python 3 virtual environment.
  • Prerequisites: Docker Desktop is used for setting up example inference servers (e.g., Ollama). Specific model tokenizers and inference server compatibility are needed.
  • Setup: Example involves Docker commands to run Ollama and pull models.
  • Links: Tutorials and guides are available for profiling models with vLLM, Hugging Face TGI, and OpenAI-compatible APIs.

Highlighted Details

  • Supports multiple benchmarking modes: concurrency, request-rate, and trace replay.
  • Extensive dataset support, including ShareGPT, custom formats, and specialized datasets for multimodal models, code generation, and speculative decoding.
  • Offers flexible UI modes (dashboard TUI, simple, headless) and supports various endpoint types (OpenAI chat/embeddings, NIM, etc.).
  • Features advanced load control (arrival patterns, ramping, warmup) and user-centric timing for realistic performance analysis.

Maintenance & Community

A CONTRIBUTING.md file is provided for development setup and contribution guidelines. No specific community links (Discord, Slack), maintainer information, sponsorships, or roadmap details are present in the README.

Licensing & Compatibility

The README does not explicitly state the project's license. This omission requires clarification for determining commercial use or closed-source linking compatibility.

Limitations & Caveats

Output sequence length constraints may not be guaranteed without specific inference server support. High concurrency settings (>15,000) might cause port exhaustion. Startup errors from invalid configurations can lead to indefinite hangs. Dashboard UI text copying may be unreliable; use the 'c' key for full log copy.

Health Check
Last Commit

10 hours ago

Responsiveness

Inactive

Pull Requests (30d)
74
Issues (30d)
1
Star History
57 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
3 more.

LitServe by Lightning-AI

0.2%
4k
AI inference pipeline framework
Created 2 years ago
Updated 2 weeks ago
Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
2 more.

vllm-omni by vllm-project

2.6%
5k
Omni-modality model inference and serving framework
Created 7 months ago
Updated 6 hours ago
Feedback? Help us improve.