CLI tool for local LLM throughput benchmarking via Ollama
Top 97.2% on sourcepulse
This project provides a command-line tool for benchmarking the throughput of local Large Language Models (LLMs) managed by Ollama. It's designed for users and developers who want to measure and compare the performance of different LLMs running on their own hardware.
How It Works
The tool leverages Ollama's API to interact with local LLM instances. It automatically detects available system RAM to suggest and download appropriate LLM models for benchmarking. The core functionality involves sending requests to these models and measuring the time taken to generate responses, thereby calculating throughput.
Quick Start & Requirements
pip install llm-benchmark
or pipx install llm-benchmark
.gemma:2b
phi3:3.8b
, gemma2:9b
, mistral:7b
, llama3.1:8b
, deepseek-r1:8b
, llava:7b
gemma2:9b
, mistral:7b
, phi4:14b
, deepseek-r1:8b
, deepseek-r1:14b
, llava:7b
, llava:13b
31GB RAM:
phi4:14b
,deepseek-r1:14b
,deepseek-r1:32b
Highlighted Details
Maintenance & Community
No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.
Licensing & Compatibility
The README does not explicitly state a license.
Limitations & Caveats
The tool's model selection logic is implicitly tied to RAM, and the specific models listed may not cover all available Ollama models or user preferences. The README does not detail error handling or specific performance metrics beyond throughput.
1 month ago
1 week