ollama-grid-search  by dezoito

Desktop app to evaluate/compare LLMs

created 1 year ago
796 stars

Top 45.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a desktop application for evaluating and comparing Large Language Models (LLMs) served via Ollama. It targets users who need to systematically test different models, prompts, and inference parameters to find optimal configurations for their use cases, offering a visual interface for results inspection and A/B testing.

How It Works

The application leverages Ollama's API to interact with various LLMs. It implements a "grid search" concept by iterating through user-defined combinations of models, prompts, and inference parameters (e.g., temperature, top_p). This systematic approach allows for comprehensive testing, with features for A/B testing different prompts or models side-by-side and managing experiment logs for reproducibility.

Quick Start & Requirements

  • Install: Download pre-built binaries from the releases page.
  • Prerequisites: Ollama must be installed and serving models locally or remotely.
  • Setup: Minimal setup required if Ollama is already running.
  • Docs: Project Blog Post

Highlighted Details

  • Automated fetching of models from local or remote Ollama servers.
  • Supports A/B testing of prompts and models simultaneously.
  • Includes a prompt archive with "/" autocompletion for prompt selection.
  • Outputs inference metadata like time and tokens per second.

Maintenance & Community

  • Active development with contributions from multiple individuals.
  • Development notes and workflow charts are available for contributors.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The project is primarily focused on evaluating models served through Ollama; integration with other LLM serving frameworks is not supported. Future features include result grading and sharing capabilities.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
81 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe), and
2 more.

prompttools by hegelai

0.3%
3k
Open-source tools for prompt testing and experimentation
created 2 years ago
updated 11 months ago
Feedback? Help us improve.