Interactive tool for side-by-side LLM evaluation
Top 66.7% on sourcepulse
LLM Comparator is an interactive visualization tool and Python library for analyzing side-by-side evaluations of Large Language Models (LLMs). It enables users to qualitatively assess differences in LLM responses at both example and slice levels, aiding in the discovery of patterns and reasons for performance variations. The tool is primarily aimed at researchers and developers evaluating LLM outputs.
How It Works
The tool visualizes data from JSON files containing comparative LLM responses. Each entry includes the input prompt, outputs from two models (A and B), and a score indicating which response is preferred (e.g., from an LLM-as-a-judge system). It supports rich metadata and custom fields, allowing for detailed analysis of response characteristics, such as word count, specific stylistic elements, or categorical tags, visualized through interactive charts and tables.
Quick Start & Requirements
git clone https://github.com/PAIR-code/llm-comparator.git
cd llm-comparator
npm install
npm run build
npm run serve
input_text
, output_text_a
, output_text_b
, and score
.Highlighted Details
Maintenance & Community
This is a research project under active development by the PAIR team. Further details and potential community engagement channels are not explicitly listed in the README.
Licensing & Compatibility
The project is released under an unspecified license. The disclaimer states "This is not an official Google product," suggesting potential implications for commercial use or integration into proprietary systems.
Limitations & Caveats
The project is described as being in an early stage of development with potential bugs. The license is not specified, which may pose a barrier to commercial adoption or integration into closed-source projects.
5 months ago
1+ week