autoevals by braintrustdata

Evaluation tool for AI model outputs using automatic methods

Created 2 years ago

766 stars

Top 45.6% on SourcePulse

View on GitHub

4 Experts Love This Project

Shyamal Anadkat

Research Scientist at OpenAI

Travis Fischer

Founder of Agentic

Jared Palmer

SVP at GitHub; Founder of Turborepo; Author of Formik, TSDX

Ishaan Jaffer

Cofounder of LiteLLM

Project Summary

AutoEvals is a Python and TypeScript library for evaluating AI model outputs using a variety of methods, including LLM-as-a-judge, statistical, and heuristic approaches. It aims to simplify the process of debugging, comparing, and managing AI evaluations, making it easier for developers and researchers to assess model performance across subjective tasks like fact-checking and safety.

How It Works

AutoEvals provides a unified interface for diverse evaluation metrics, normalizing results to a 0-1 scale. It simplifies complex tasks like parsing LLM-generated outputs and debugging individual evaluations by allowing flexible prompt tweaking and direct output inspection. The library supports custom evaluation prompts and user-defined scoring functions, enabling tailored assessment workflows.

Quick Start & Requirements

Python: pip install autoevals
TypeScript: npm install autoevals
Requirements: Python 3.9+, OpenAI Python SDK v0.x/v1.x compatible. Requires OPENAI_API_KEY environment variable for default OpenAI usage.
Docs: https://www.braintrust.dev/docs/reference/autoevals

Highlighted Details

Supports LLM-as-a-judge evaluations for tasks like Factuality, Moderation, and SQL.
Includes heuristic (Levenshtein, Exact Match) and statistical (BLEU) methods.
Enables custom LLM classifiers with user-defined prompts and scoring logic.
Integrates with Braintrust for logging and comparison of evaluation results.

Maintenance & Community

Developed by the team at Braintrust. Contribution guidelines and development setup are available in the README.

Licensing & Compatibility

The library appears to be open-source, but the specific license is not explicitly stated in the provided README. Compatibility for commercial use or closed-source linking would require license clarification.

Limitations & Caveats

The README does not explicitly state the license, which could be a blocker for commercial adoption. While it supports various AI providers via OpenAI-compatible APIs, specific provider configurations might require further investigation.

Health Check

Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

19 stars in the last 30 days