judges  by quotient-ai

LLM evaluation library for classifiers and graders

Created 1 year ago
294 stars

Top 89.8% on SourcePulse

GitHubView on GitHub
Project Summary

This library provides a curated set of LLM-as-a-Judge evaluators for assessing AI model outputs, targeting developers and researchers building and evaluating LLM applications. It offers a low-friction format for using and creating LLM evaluators, backed by research, to improve output quality and reliability.

How It Works

The library offers two primary judge types: Classifiers (returning boolean True/False for evaluation pass/fail) and Graders (returning numerical or Likert scale scores). Judges are invoked via a .judge() method, accepting input, output, and optional expected values. The library automatically resolves boolean outputs from underlying LLM prompts. A Jury object allows combining multiple judges for diversified and averaged judgments, producing a Verdict.

Quick Start & Requirements

  • Install via pip install judges.
  • Requires an API key for the chosen LLM provider (e.g., OPENAI_API_KEY).
  • Example usage involves importing judge classes and calling the .judge() method with model outputs.

Highlighted Details

  • Supports creating custom judges by inheriting from BaseJudge and implementing a .judge() method.
  • Includes an AutoJudge feature to create task-specific judges from labeled datasets and descriptions.
  • Provides a CLI for evaluating single or batch test cases with various judges.
  • Offers a comprehensive appendix listing classifiers and graders with their descriptions and reference papers.

Maintenance & Community

No specific community channels or notable contributors are mentioned in the README.

Licensing & Compatibility

The library's license is not explicitly stated in the README.

Limitations & Caveats

The README does not specify any limitations or known caveats regarding the library's functionality or stability.

Health Check
Last Commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Han Wang Han Wang(Cofounder of Mintlify), John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), and
6 more.

evidently by evidentlyai

0.3%
7k
Open-source framework for ML/LLM observability
Created 4 years ago
Updated 19 hours ago
Feedback? Help us improve.