aiperf by ai-dynamo

Benchmark generative AI model performance across diverse inference solutions

Created 1 year ago

251 stars

Top 99.9% on SourcePulse

Project Summary

Summary

AIPerf is a comprehensive benchmarking tool for measuring generative AI model performance across various inference solutions. It provides detailed metrics via a command-line display and extensive reports, targeting engineers and researchers needing to evaluate and optimize AI model deployments.

How It Works

It utilizes a scalable multiprocess architecture with services communicating via ZMQ. AIPerf supports diverse benchmarking modes (concurrency, request-rate, trace replay) and offers three UI options: a real-time TUI dashboard, simple progress bars, or headless execution. A key advantage is its extensibility through a plugin system for custom endpoints, datasets, transports, and metrics.

Quick Start & Requirements

Installation: pip install aiperf within a Python 3 virtual environment.
Prerequisites: Docker Desktop is used for setting up example inference servers (e.g., Ollama). Specific model tokenizers and inference server compatibility are needed.
Setup: Example involves Docker commands to run Ollama and pull models.
Links: Tutorials and guides are available for profiling models with vLLM, Hugging Face TGI, and OpenAI-compatible APIs.

Highlighted Details

Supports multiple benchmarking modes: concurrency, request-rate, and trace replay.
Extensive dataset support, including ShareGPT, custom formats, and specialized datasets for multimodal models, code generation, and speculative decoding.
Offers flexible UI modes (dashboard TUI, simple, headless) and supports various endpoint types (OpenAI chat/embeddings, NIM, etc.).
Features advanced load control (arrival patterns, ramping, warmup) and user-centric timing for realistic performance analysis.

Maintenance & Community

A CONTRIBUTING.md file is provided for development setup and contribution guidelines. No specific community links (Discord, Slack), maintainer information, sponsorships, or roadmap details are present in the README.

Licensing & Compatibility

The README does not explicitly state the project's license. This omission requires clarification for determining commercial use or closed-source linking compatibility.

Limitations & Caveats

Output sequence length constraints may not be guaranteed without specific inference server support. High concurrency settings (>15,000) might cause port exhaustion. Startup errors from invalid configurations can lead to indefinite hangs. Dashboard UI text copying may be unreliable; use the 'c' key for full log copy.

aiperf by ai-dynamo

Explore Similar Projects

gpt_server by shell-nlp

Long-VITA by VITA-MLLM

Evaluator by NVIDIA-NeMo

ComfyUI-QwenVL by 1038lab

RyzenAI-SW by amd

SWELancer-Benchmark by openai

LitServe by Lightning-AI

sdnext by vladmandic

lmms-eval by EvolvingLMMs-Lab

inference by mlcommons

vllm-omni by vllm-project

gallery by google-ai-edge