judgeval  by JudgmentLabs

Agent observability and self-learning toolkit

Created 1 year ago
1,014 stars

Top 36.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Judgeval provides open-source tooling for tracing, evaluating, and monitoring autonomous, stateful agents, enabling continuous learning and self-improvement. It captures runtime data from agent-environment interactions, targeting developers and researchers building and deploying AI agents.

How It Works

Judgeval integrates via a Python SDK to automatically trace agent execution, capturing inputs, outputs, tool calls, latency, and custom metadata. This data can be exported for analysis, used to build custom evaluators (including LLM-as-a-judge), and trigger alerts for production monitoring. The approach facilitates debugging, performance bottleneck identification, and data-driven agent optimization.

Quick Start & Requirements

Highlighted Details

  • Automatic tracing for OpenAI, Anthropic, and LangGraph.
  • Supports LLM-as-a-judge, manual labeling, and code-based evaluators.
  • Production monitoring with Slack alerts and custom hooks.
  • Data export to Parquet/S3 for scaled analysis and A/B testing.

Maintenance & Community

Maintained by Judgment Labs. Community channels include Discord (https://discord.gg/tGVFf8UBUY) and X (https://x.com/JudgmentLabs).

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires clarification for commercial use or closed-source linking.

Limitations & Caveats

The license is not specified, which is a significant blocker for determining commercial usability. The core functionality relies on connecting to the Judgment Platform, requiring API keys or a self-hosted instance.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
9
Issues (30d)
3
Star History
14 stars in the last 30 days

Explore Similar Projects

Starred by Alex Graveley Alex Graveley(Creator of GitHub Copilot, Dropbox Paper, Mobilecoin, Hackpad), Gregor Zunic Gregor Zunic(Cofounder of Browser Use), and
4 more.

lmnr by lmnr-ai

1.0%
3k
Open-source platform for engineering AI products
Created 1 year ago
Updated 1 day ago
Starred by Morgan Funtowicz Morgan Funtowicz(Head of ML Optimizations at Hugging Face), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
8 more.

lighteval by huggingface

0.5%
2k
LLM evaluation toolkit for multiple backends
Created 1 year ago
Updated 3 days ago
Starred by Han Wang Han Wang(Cofounder of Mintlify), John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), and
6 more.

evidently by evidentlyai

0.3%
7k
Open-source framework for ML/LLM observability
Created 5 years ago
Updated 2 days ago
Feedback? Help us improve.