ollama-benchmark by aidatatools

CLI tool for local LLM throughput benchmarking via Ollama

Created 2 years ago

321 stars

Top 84.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeffrey Morgan

Cofounder of Ollama

Project Summary

This project provides a command-line tool for benchmarking the throughput of local Large Language Models (LLMs) managed by Ollama. It's designed for users and developers who want to measure and compare the performance of different LLMs running on their own hardware.

How It Works

The tool leverages Ollama's API to interact with local LLM instances. It automatically detects available system RAM to suggest and download appropriate LLM models for benchmarking. The core functionality involves sending requests to these models and measuring the time taken to generate responses, thereby calculating throughput.

Quick Start & Requirements

Install via pip: pip install llm-benchmark or pipx install llm-benchmark.
Requires a working Ollama installation with models pulled.
Tested on Python 3.9+.
Automatic model downloading based on RAM:
- <7GB RAM: gemma:2b
- 7-15GB RAM: phi3:3.8b, gemma2:9b, mistral:7b, llama3.1:8b, deepseek-r1:8b, llava:7b
- 15-31GB RAM: gemma2:9b, mistral:7b, phi4:14b, deepseek-r1:8b, deepseek-r1:14b, llava:7b, llava:13b
- 31GB RAM: phi4:14b, deepseek-r1:14b, deepseek-r1:32b
Official documentation: https://python-poetry.org/docs/#installing-manually (for advanced/developer installation)

Highlighted Details

Benchmarks LLM throughput using Ollama.
Automatically selects and pulls models based on system RAM.
Supports custom benchmark model lists via YAML files.
Option to send benchmark results and system info to a remote server.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

The tool's model selection logic is implicitly tied to RAM, and the specific models listed may not cover all available Ollama models or user preferences. The README does not detail error handling or specific performance metrics beyond throughput.

ollama-benchmark by aidatatools

Explore Similar Projects

Awesome-KV-Cache-Management by TreeAI-Lab

genai-bench by sgl-project

yalm by andrewkchan

InferenceMAX by InferenceMAX

LLM-Viewer by hahnyuan

marlin by IST-DASLab

xpu-perf by bytedance

LiteRT-LM by google-ai-edge

HeCBench by zjin-lcf

xTuring by stochasticai

intel-extension-for-pytorch by intel

CTranslate2 by OpenNMT