tonic_validate  by TonicAI

LLM/RAG evaluation framework

created 1 year ago
315 stars

Top 86.9% on sourcepulse

GitHubView on GitHub
Project Summary

Tonic Validate is an open-source framework designed to evaluate the quality of responses from Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) applications. It offers a suite of metrics to assess aspects like answer correctness, retrieval relevance, and hallucination, with an optional UI for visualizing results, targeting developers and researchers building LLM-powered systems.

How It Works

The framework operates by accepting benchmark data (questions, reference answers) and LLM outputs (generated answers, retrieved contexts). It then applies various metrics, many of which leverage an LLM (defaulting to GPT-4 Turbo, but configurable to OpenAI, Azure OpenAI, Gemini, Claude, Mistral, Cohere, Together AI, and AWS Bedrock) to score the quality of the LLM's response against the provided inputs. Users can either provide a callback function to capture LLM responses or manually log them for scoring.

Quick Start & Requirements

  • Install: pip install tonic-validate
  • Prerequisites: OpenAI API key (or other supported LLM provider keys) set as environment variables. Python 3.9+ required for Gemini.
  • Documentation: Explore the docs »

Highlighted Details

  • Supports a wide range of LLM providers for metric evaluation.
  • Offers metrics like Answer Similarity, Retrieval Precision, Augmentation Precision, Augmentation Accuracy, Answer Consistency, Latency, and Contains Text.
  • Includes an optional, free UI for visualizing evaluation results and tracking performance over time.
  • Provides a GitHub Action for integrating evaluations into CI/CD pipelines.

Maintenance & Community

  • Active development with contributions encouraged via pull requests and issues.
  • Community support via GitHub issues.

Licensing & Compatibility

  • MIT License. Permissive for commercial use and integration with closed-source applications.

Limitations & Caveats

  • Metric scoring relies on external LLM APIs, incurring potential costs and latency.
  • Telemetry is collected by default but can be opted out by setting the TONIC_VALIDATE_DO_NOT_TRACK environment variable.
Health Check
Last commit

3 weeks ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
19 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Jerry Liu Jerry Liu(Cofounder of LlamaIndex).

deepeval by confident-ai

2.0%
10k
LLM evaluation framework for unit testing LLM outputs
created 2 years ago
updated 11 hours ago
Feedback? Help us improve.