Desktop app to evaluate/compare LLMs
Top 45.1% on sourcepulse
This project provides a desktop application for evaluating and comparing Large Language Models (LLMs) served via Ollama. It targets users who need to systematically test different models, prompts, and inference parameters to find optimal configurations for their use cases, offering a visual interface for results inspection and A/B testing.
How It Works
The application leverages Ollama's API to interact with various LLMs. It implements a "grid search" concept by iterating through user-defined combinations of models, prompts, and inference parameters (e.g., temperature, top_p). This systematic approach allows for comprehensive testing, with features for A/B testing different prompts or models side-by-side and managing experiment logs for reproducibility.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is primarily focused on evaluating models served through Ollama; integration with other LLM serving frameworks is not supported. Future features include result grading and sharing capabilities.
3 months ago
1 day