athina-evals  by athina-ai

Python SDK for LLM response evaluation

Created 1 year ago
292 stars

Top 90.4% on SourcePulse

GitHubView on GitHub
Project Summary

Athina-evals provides a Python SDK for evaluating Large Language Model (LLM) responses, offering over 50 preset evaluations and support for custom ones. It's designed for AI teams focused on observability and experimentation, serving as a companion to the Athina IDE for prototyping, running experiments, and comparing datasets.

How It Works

The SDK allows programmatic execution of evaluations, with results visualized and managed within the Athina IDE. This integrated approach facilitates side-by-side dataset comparison and experiment tracking, streamlining the LLM development lifecycle.

Quick Start & Requirements

Highlighted Details

  • Over 50 preset evaluations available.
  • Supports custom evaluation creation.
  • Integrates with Athina IDE for enhanced workflow.
  • Enables side-by-side dataset comparison.

Maintenance & Community

No specific contributor or community details are provided in the README.

Licensing & Compatibility

The README does not specify a license.

Limitations & Caveats

The README does not detail any limitations or caveats.

Health Check
Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
3 more.

promptbench by microsoft

0.1%
3k
LLM evaluation framework
Created 2 years ago
Updated 1 month ago
Starred by Morgan Funtowicz Morgan Funtowicz(Head of ML Optimizations at Hugging Face), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
7 more.

lighteval by huggingface

2.6%
2k
LLM evaluation toolkit for multiple backends
Created 1 year ago
Updated 1 day ago
Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Jared Palmer Jared Palmer(Ex-VP AI at Vercel; Founder of Turborepo; Author of Formik, TSDX), and
3 more.

human-eval by openai

0.4%
3k
Evaluation harness for LLMs trained on code
Created 4 years ago
Updated 8 months ago
Starred by Anastasios Angelopoulos Anastasios Angelopoulos(Cofounder of LMArena), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
34 more.

evals by openai

0.2%
17k
Framework for evaluating LLMs and LLM systems, plus benchmark registry
Created 2 years ago
Updated 9 months ago
Feedback? Help us improve.