Discover and explore top open-source AI tools and projects—updated daily.
eugrLLM inference benchmarking tool for OpenAI-compatible endpoints
Top 68.8% on SourcePulse
This tool addresses the challenge of benchmarking Large Language Model (LLM) inference endpoints, particularly for backends beyond llama.cpp and for accurately measuring prompt processing speeds at varying context lengths. It targets engineers, researchers, and power users evaluating LLM performance, offering a standardized method to assess prompt processing and token generation speeds, TTFR, est_ppt, and e2e_ttft across different OpenAI-compatible services.
How It Works
llama-benchy operates by sending requests to OpenAI-compatible LLM endpoints, systematically varying prompt lengths, generation lengths, and context depths. It measures key performance indicators like Prompt Processing (pp) and Token Generation (tg) speeds, alongside Time To First Response (TTFR), Estimated Prompt Processing Time (est_ppt), and End-to-End Time To First Token (e2e_ttft). The tool leverages HuggingFace tokenizers for accuracy and uses realistic text from Project Gutenberg for prompt generation, aiding in the evaluation of speculative decoding and MTP. A notable feature is its ability to benchmark prefix caching performance, providing insights into how well inference servers handle repeated contexts.
Quick Start & Requirements
Installation is recommended using uv. The simplest way to run is via uvx llama-benchy --base-url <ENDPOINT_URL> --model <MODEL_NAME>. Alternatively, clone the repository and install using uv pip install -e . within a virtual environment. Key requirements include an OpenAI-compatible LLM endpoint.
Highlighted Details
Maintenance & Community
The project shows recent development activity, with a version dated February 6, 2026. No specific community links (e.g., Discord, Slack) or notable contributors are detailed in the provided README.
Licensing & Compatibility
The license type is not explicitly stated in the provided README. This omission requires further investigation for commercial use or closed-source integration.
Limitations & Caveats
The tool currently only evaluates against /v1/chat/completions endpoints. The absence of a stated license is a significant caveat for adoption.
2 weeks ago
Inactive
ray-project
kagisearch
EricLBuehler