ollama-benchmark  by aidatatools

CLI tool for local LLM throughput benchmarking via Ollama

Created 1 year ago
293 stars

Top 90.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a command-line tool for benchmarking the throughput of local Large Language Models (LLMs) managed by Ollama. It's designed for users and developers who want to measure and compare the performance of different LLMs running on their own hardware.

How It Works

The tool leverages Ollama's API to interact with local LLM instances. It automatically detects available system RAM to suggest and download appropriate LLM models for benchmarking. The core functionality involves sending requests to these models and measuring the time taken to generate responses, thereby calculating throughput.

Quick Start & Requirements

  • Install via pip: pip install llm-benchmark or pipx install llm-benchmark.
  • Requires a working Ollama installation with models pulled.
  • Tested on Python 3.9+.
  • Automatic model downloading based on RAM:
    • <7GB RAM: gemma:2b
    • 7-15GB RAM: phi3:3.8b, gemma2:9b, mistral:7b, llama3.1:8b, deepseek-r1:8b, llava:7b
    • 15-31GB RAM: gemma2:9b, mistral:7b, phi4:14b, deepseek-r1:8b, deepseek-r1:14b, llava:7b, llava:13b
    • 31GB RAM: phi4:14b, deepseek-r1:14b, deepseek-r1:32b

  • Official documentation: https://python-poetry.org/docs/#installing-manually (for advanced/developer installation)

Highlighted Details

  • Benchmarks LLM throughput using Ollama.
  • Automatically selects and pulls models based on system RAM.
  • Supports custom benchmark model lists via YAML files.
  • Option to send benchmark results and system info to a remote server.

Maintenance & Community

No specific contributors, sponsorships, or community links (Discord/Slack) are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

The tool's model selection logic is implicitly tied to RAM, and the specific models listed may not cover all available Ollama models or user preferences. The README does not detail error handling or specific performance metrics beyond throughput.

Health Check
Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

xTuring by stochasticai

0.0%
3k
SDK for fine-tuning and customizing open-source LLMs
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.