Discover and explore top open-source AI tools and projects—updated daily.
stevibeDesktop app for local LLM benchmarking and comparison
Top 92.0% on SourcePulse
A local-first desktop application designed for running, comparing, and managing installable LLM Bench Packs. It targets developers and researchers needing to evaluate LLM performance on real-world tasks, offering a streamlined, reproducible benchmarking workflow with side-by-side model comparisons.
How It Works
BenchLocal operates as a desktop application, managing a shared runtime for LLM providers, model registries, and Bench Pack installations. Each Bench Pack encapsulates specific benchmark behaviors, including scenario definitions, prompts, scoring logic, and verifier contracts. The application orchestrates the execution of these Bench Packs against configured local or remote models, manages run history, and provides per-tab sampling overrides. An integrated Agent Access API exposes local agent surfaces, enabling AI agents and automation tools to control benchmark workflows programmatically via HTTP commands and Server-Sent Events, while the desktop UI remains active.
Quick Start & Requirements
For development, the application can be built using npm run build and packaged for production with npm run pack. The README indicates the availability of "Download" and "Watch demo" options for end-users, suggesting pre-compiled binaries. Specific OS, hardware, or non-default software prerequisites (e.g., CUDA, specific Python versions) are not detailed in the provided text.
Highlighted Details
app/ (Electron shell, UI), packages/ (shared core, SDK, benchpack host), themes/, and scripts/.Maintenance & Community
The provided README does not detail specific contributors, sponsorships, community channels (e.g., Discord, Slack), or roadmap information.
Licensing & Compatibility
The project is licensed under the MIT license, which is permissive and generally compatible with commercial use and closed-source linking.
Limitations & Caveats
The primary focus is on local LLM execution and Bench Pack management; extensive details on remote model integration or advanced configuration are not elaborated upon. The Agent Access API documentation is referenced as being in a separate file (docs/agent-control-api.md).
5 days ago
Inactive
groq
openlit
NirDiamant
earendil-works