BenchLocal  by stevibe

Desktop app for local LLM benchmarking and comparison

Created 1 month ago
284 stars

Top 92.0% on SourcePulse

GitHubView on GitHub
Project Summary

A local-first desktop application designed for running, comparing, and managing installable LLM Bench Packs. It targets developers and researchers needing to evaluate LLM performance on real-world tasks, offering a streamlined, reproducible benchmarking workflow with side-by-side model comparisons.

How It Works

BenchLocal operates as a desktop application, managing a shared runtime for LLM providers, model registries, and Bench Pack installations. Each Bench Pack encapsulates specific benchmark behaviors, including scenario definitions, prompts, scoring logic, and verifier contracts. The application orchestrates the execution of these Bench Packs against configured local or remote models, manages run history, and provides per-tab sampling overrides. An integrated Agent Access API exposes local agent surfaces, enabling AI agents and automation tools to control benchmark workflows programmatically via HTTP commands and Server-Sent Events, while the desktop UI remains active.

Quick Start & Requirements

For development, the application can be built using npm run build and packaged for production with npm run pack. The README indicates the availability of "Download" and "Watch demo" options for end-users, suggesting pre-compiled binaries. Specific OS, hardware, or non-default software prerequisites (e.g., CUDA, specific Python versions) are not detailed in the provided text.

Highlighted Details

  • Official Bench Packs: Includes pre-built Bench Packs for tasks such as ToolCall-15, BugFind-15, DataExtract-15, InstructFollow-15, ReasonMath-15, StructOutput-15, CLI-40, and HermesAgent-20.
  • Agent Control API: Offers a bearer token-secured HTTP API and MCP integration for programmatic control over benchmarking workflows, including listing Bench Packs, managing providers/models, and initiating/resuming runs.
  • Local-First Architecture: Emphasizes a desktop-centric approach for managing LLM interactions and benchmark data.
  • Repo Structure: Organized into app/ (Electron shell, UI), packages/ (shared core, SDK, benchpack host), themes/, and scripts/.

Maintenance & Community

The provided README does not detail specific contributors, sponsorships, community channels (e.g., Discord, Slack), or roadmap information.

Licensing & Compatibility

The project is licensed under the MIT license, which is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

The primary focus is on local LLM execution and Bench Pack management; extensive details on remote model integration or advanced configuration are not elaborated upon. The Agent Access API documentation is referenced as being in a separate file (docs/agent-control-api.md).

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
11
Star History
101 stars in the last 30 days

Explore Similar Projects

Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google), Zhen Lu Zhen Lu(Cofounder of Runpod), and
1 more.

agents-towards-production by NirDiamant

0.8%
20k
Production-ready GenAI agent tutorials
Created 11 months ago
Updated 3 days ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App), Alex Chen Alex Chen(Cofounder of Nexa AI), and
31 more.

pi by earendil-works

6.7%
56k
AI agent framework and LLM deployment tools
Created 9 months ago
Updated 11 hours ago
Feedback? Help us improve.