Discover and explore top open-source AI tools and projects—updated daily.
alexziskind1`llama.cpp` server throughput benchmarking harness
Top 94.7% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.>
This repository provides an interactive launcher and benchmarking harness for llama.cpp server throughput. It enables engineers and researchers to systematically test, sweep parameters, and analyze the performance of llama.cpp deployments under various load conditions, facilitating optimization and adoption decisions.
How It Works
This project offers a dialog-based launcher (./run_llama_tests.py) to configure and execute throughput tests and parameter sweeps for the llama.cpp server. It supports single-request, concurrent, and round-robin load testing (requiring nginx), along with sweeps that explore parameter ranges like threads, concurrency, and instances. The system heavily relies on environment variables for detailed configuration of model paths, server arguments, and test parameters, allowing for deep customization of benchmarking scenarios.
Quick Start & Requirements
dialog package (e.g., sudo apt-get install dialog on Debian/Ubuntu). Clone and build llama.cpp to obtain the llama-server binary.llama.cpp with the llama-server binary, nginx installed (for round-robin tests/sweeps), and a GGUF model file../run_llama_tests.py for the interactive launcher, or run tests/scripts directly using Python (e.g., .venv/bin/python -m unittest tests/test_llama_server_concurrent.py).LLAMA_MODEL_PATH, LLAMA_CPP_DIR, LLAMA_SERVER_HOST, LLAMA_CONCURRENCY).Highlighted Details
nginx).--threads/--threads-http), round-robin configurations (max tokens x concurrency), and full sweeps (instances x parallel x concurrency).analyze-data.py script for processing sweep results CSV files, enabling sorting by throughput, errors, and other metrics.Maintenance & Community
No specific information regarding maintainers, community channels (e.g., Discord, Slack), or project roadmap is provided in the README.
Licensing & Compatibility
The README does not specify a software license. This absence may pose compatibility concerns for commercial use or integration into closed-source projects.
Limitations & Caveats
A pre-built llama.cpp with the llama-server binary is a mandatory prerequisite. nginx is required for round-robin tests and sweeps. Sweep scripts automatically manage certain flags (--parallel, --batch-size, --ubatch), preventing their direct use via LLAMA_SERVER_ARGS during sweeps. The lack of explicit licensing information is a significant caveat for adoption.
2 weeks ago
Inactive
ray-project
AI-Hypercomputer
ray-project