BenchLocal by stevibe

Desktop app for local LLM benchmarking and comparison

Created 3 months ago

370 stars

Top 76.2% on SourcePulse

Project Summary

A local-first desktop application designed for running, comparing, and managing installable LLM Bench Packs. It targets developers and researchers needing to evaluate LLM performance on real-world tasks, offering a streamlined, reproducible benchmarking workflow with side-by-side model comparisons.

How It Works

BenchLocal operates as a desktop application, managing a shared runtime for LLM providers, model registries, and Bench Pack installations. Each Bench Pack encapsulates specific benchmark behaviors, including scenario definitions, prompts, scoring logic, and verifier contracts. The application orchestrates the execution of these Bench Packs against configured local or remote models, manages run history, and provides per-tab sampling overrides. An integrated Agent Access API exposes local agent surfaces, enabling AI agents and automation tools to control benchmark workflows programmatically via HTTP commands and Server-Sent Events, while the desktop UI remains active.

Quick Start & Requirements

For development, the application can be built using npm run build and packaged for production with npm run pack. The README indicates the availability of "Download" and "Watch demo" options for end-users, suggesting pre-compiled binaries. Specific OS, hardware, or non-default software prerequisites (e.g., CUDA, specific Python versions) are not detailed in the provided text.

Highlighted Details

Official Bench Packs: Includes pre-built Bench Packs for tasks such as ToolCall-15, BugFind-15, DataExtract-15, InstructFollow-15, ReasonMath-15, StructOutput-15, CLI-40, and HermesAgent-20.
Agent Control API: Offers a bearer token-secured HTTP API and MCP integration for programmatic control over benchmarking workflows, including listing Bench Packs, managing providers/models, and initiating/resuming runs.
Local-First Architecture: Emphasizes a desktop-centric approach for managing LLM interactions and benchmark data.
Repo Structure: Organized into app/ (Electron shell, UI), packages/ (shared core, SDK, benchpack host), themes/, and scripts/.

Maintenance & Community

The provided README does not detail specific contributors, sponsorships, community channels (e.g., Discord, Slack), or roadmap information.

Licensing & Compatibility

The project is licensed under the MIT license, which is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The primary focus is on local LLM execution and Bench Pack management; extensive details on remote model integration or advanced configuration are not elaborated upon. The Agent Access API documentation is referenced as being in a separate file (docs/agent-control-api.md).

BenchLocal by stevibe

Explore Similar Projects

awesome-large-action-model by tjtanaa

palico-ai by palico-ai

agisdk by agi-inc

self_improving_coding_agent by MaximeRobeyns

SWE-bench_Pro-os by scaleapi

openbench by groq

openlit by openlit

TaskingAI by TaskingAI

holaOS by holaboss-ai

phoenix by Arize-ai

agents-towards-production by NirDiamant

pi by earendil-works