tonic_validate  by TonicAI

LLM/RAG evaluation framework

Created 1 year ago
318 stars

Top 85.0% on SourcePulse

GitHubView on GitHub
Project Summary

Tonic Validate is an open-source framework designed to evaluate the quality of responses from Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) applications. It offers a suite of metrics to assess aspects like answer correctness, retrieval relevance, and hallucination, with an optional UI for visualizing results, targeting developers and researchers building LLM-powered systems.

How It Works

The framework operates by accepting benchmark data (questions, reference answers) and LLM outputs (generated answers, retrieved contexts). It then applies various metrics, many of which leverage an LLM (defaulting to GPT-4 Turbo, but configurable to OpenAI, Azure OpenAI, Gemini, Claude, Mistral, Cohere, Together AI, and AWS Bedrock) to score the quality of the LLM's response against the provided inputs. Users can either provide a callback function to capture LLM responses or manually log them for scoring.

Quick Start & Requirements

  • Install: pip install tonic-validate
  • Prerequisites: OpenAI API key (or other supported LLM provider keys) set as environment variables. Python 3.9+ required for Gemini.
  • Documentation: Explore the docs »

Highlighted Details

  • Supports a wide range of LLM providers for metric evaluation.
  • Offers metrics like Answer Similarity, Retrieval Precision, Augmentation Precision, Augmentation Accuracy, Answer Consistency, Latency, and Contains Text.
  • Includes an optional, free UI for visualizing evaluation results and tracking performance over time.
  • Provides a GitHub Action for integrating evaluations into CI/CD pipelines.

Maintenance & Community

  • Active development with contributions encouraged via pull requests and issues.
  • Community support via GitHub issues.

Licensing & Compatibility

  • MIT License. Permissive for commercial use and integration with closed-source applications.

Limitations & Caveats

  • Metric scoring relies on external LLM APIs, incurring potential costs and latency.
  • Telemetry is collected by default but can be opted out by setting the TONIC_VALIDATE_DO_NOT_TRACK environment variable.
Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Nir Gazit Nir Gazit(Cofounder of Traceloop), Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

haven by redotvideo

0%
346
LLM fine-tuning and evaluation platform
Created 2 years ago
Updated 1 year ago
Starred by Morgan Funtowicz Morgan Funtowicz(Head of ML Optimizations at Hugging Face), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
7 more.

lighteval by huggingface

2.6%
2k
LLM evaluation toolkit for multiple backends
Created 1 year ago
Updated 1 day ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

opik by comet-ml

1.7%
14k
Open-source LLM evaluation framework for RAG, agents, and more
Created 2 years ago
Updated 14 hours ago
Feedback? Help us improve.