LLM/RAG evaluation framework
Top 86.9% on sourcepulse
Tonic Validate is an open-source framework designed to evaluate the quality of responses from Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) applications. It offers a suite of metrics to assess aspects like answer correctness, retrieval relevance, and hallucination, with an optional UI for visualizing results, targeting developers and researchers building LLM-powered systems.
How It Works
The framework operates by accepting benchmark data (questions, reference answers) and LLM outputs (generated answers, retrieved contexts). It then applies various metrics, many of which leverage an LLM (defaulting to GPT-4 Turbo, but configurable to OpenAI, Azure OpenAI, Gemini, Claude, Mistral, Cohere, Together AI, and AWS Bedrock) to score the quality of the LLM's response against the provided inputs. Users can either provide a callback function to capture LLM responses or manually log them for scoring.
Quick Start & Requirements
pip install tonic-validate
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
TONIC_VALIDATE_DO_NOT_TRACK
environment variable.3 weeks ago
1 day