ollama-benchmark  by aidatatools

CLI tool for local LLM throughput benchmarking via Ollama

created 1 year ago
265 stars

Top 97.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a command-line tool for benchmarking the throughput of local Large Language Models (LLMs) managed by Ollama. It's designed for users and developers who want to measure and compare the performance of different LLMs running on their own hardware.

How It Works

The tool leverages Ollama's API to interact with local LLM instances. It automatically detects available system RAM to suggest and download appropriate LLM models for benchmarking. The core functionality involves sending requests to these models and measuring the time taken to generate responses, thereby calculating throughput.

Quick Start & Requirements

  • Install via pip: pip install llm-benchmark or pipx install llm-benchmark.
  • Requires a working Ollama installation with models pulled.
  • Tested on Python 3.9+.
  • Automatic model downloading based on RAM:
    • <7GB RAM: gemma:2b
    • 7-15GB RAM: phi3:3.8b, gemma2:9b, mistral:7b, llama3.1:8b, deepseek-r1:8b, llava:7b
    • 15-31GB RAM: gemma2:9b, mistral:7b, phi4:14b, deepseek-r1:8b, deepseek-r1:14b, llava:7b, llava:13b
    • 31GB RAM: phi4:14b, deepseek-r1:14b, deepseek-r1:32b

  • Official documentation: https://python-poetry.org/docs/#installing-manually (for advanced/developer installation)

Highlighted Details

  • Benchmarks LLM throughput using Ollama.
  • Automatically selects and pulls models based on system RAM.
  • Supports custom benchmark model lists via YAML files.
  • Option to send benchmark results and system info to a remote server.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

The tool's model selection logic is implicitly tied to RAM, and the specific models listed may not cover all available Ollama models or user preferences. The README does not detail error handling or specific performance metrics beyond throughput.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
49 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.