ollama-grid-search  by dezoito

Desktop app to evaluate/compare LLMs

Created 1 year ago
831 stars

Top 42.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a desktop application for evaluating and comparing Large Language Models (LLMs) served via Ollama. It targets users who need to systematically test different models, prompts, and inference parameters to find optimal configurations for their use cases, offering a visual interface for results inspection and A/B testing.

How It Works

The application leverages Ollama's API to interact with various LLMs. It implements a "grid search" concept by iterating through user-defined combinations of models, prompts, and inference parameters (e.g., temperature, top_p). This systematic approach allows for comprehensive testing, with features for A/B testing different prompts or models side-by-side and managing experiment logs for reproducibility.

Quick Start & Requirements

  • Install: Download pre-built binaries from the releases page.
  • Prerequisites: Ollama must be installed and serving models locally or remotely.
  • Setup: Minimal setup required if Ollama is already running.
  • Docs: Project Blog Post

Highlighted Details

  • Automated fetching of models from local or remote Ollama servers.
  • Supports A/B testing of prompts and models simultaneously.
  • Includes a prompt archive with "/" autocompletion for prompt selection.
  • Outputs inference metadata like time and tokens per second.

Maintenance & Community

  • Active development with contributions from multiple individuals.
  • Development notes and workflow charts are available for contributors.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source applications.

Limitations & Caveats

The project is primarily focused on evaluating models served through Ollama; integration with other LLM serving frameworks is not supported. Future features include result grading and sharing capabilities.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 30 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Edward Z. Yang Edward Z. Yang(Research Engineer at Meta; Maintainer of PyTorch), and
5 more.

yet-another-applied-llm-benchmark by carlini

0.2%
1k
LLM benchmark for evaluating models on previously asked programming questions
Created 1 year ago
Updated 4 months ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
3 more.

promptbench by microsoft

0.1%
3k
LLM evaluation framework
Created 2 years ago
Updated 1 month ago
Starred by Anastasios Angelopoulos Anastasios Angelopoulos(Cofounder of LMArena), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
34 more.

evals by openai

0.2%
17k
Framework for evaluating LLMs and LLM systems, plus benchmark registry
Created 2 years ago
Updated 9 months ago
Feedback? Help us improve.