judgeval  by JudgmentLabs

Agent observability and self-learning toolkit

Created 1 year ago
1,020 stars

Top 36.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Judgeval provides open-source tooling for tracing, evaluating, and monitoring autonomous, stateful agents, enabling continuous learning and self-improvement. It captures runtime data from agent-environment interactions, targeting developers and researchers building and deploying AI agents.

How It Works

Judgeval integrates via a Python SDK to automatically trace agent execution, capturing inputs, outputs, tool calls, latency, and custom metadata. This data can be exported for analysis, used to build custom evaluators (including LLM-as-a-judge), and trigger alerts for production monitoring. The approach facilitates debugging, performance bottleneck identification, and data-driven agent optimization.

Quick Start & Requirements

Highlighted Details

  • Automatic tracing for OpenAI, Anthropic, and LangGraph.
  • Supports LLM-as-a-judge, manual labeling, and code-based evaluators.
  • Production monitoring with Slack alerts and custom hooks.
  • Data export to Parquet/S3 for scaled analysis and A/B testing.

Maintenance & Community

Maintained by Judgment Labs. Community channels include Discord (https://discord.gg/tGVFf8UBUY) and X (https://x.com/JudgmentLabs).

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires clarification for commercial use or closed-source linking.

Limitations & Caveats

The license is not specified, which is a significant blocker for determining commercial usability. The core functionality relies on connecting to the Judgment Platform, requiring API keys or a self-hosted instance.

Health Check
Last Commit

18 hours ago

Responsiveness

Inactive

Pull Requests (30d)
36
Issues (30d)
0
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Alex Graveley Alex Graveley(Creator of GitHub Copilot, Dropbox Paper, Mobilecoin, Hackpad), Gregor Zunic Gregor Zunic(Cofounder of Browser Use), and
4 more.

lmnr by lmnr-ai

0.7%
3k
Open-source platform for engineering AI products
Created 1 year ago
Updated 1 day ago
Starred by Morgan Funtowicz Morgan Funtowicz(Head of ML Optimizations at Hugging Face), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
8 more.

lighteval by huggingface

0.2%
2k
LLM evaluation toolkit for multiple backends
Created 2 years ago
Updated 5 days ago
Starred by Han Wang Han Wang(Cofounder of Mintlify), John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), and
6 more.

evidently by evidentlyai

1.4%
7k
Open-source framework for ML/LLM observability
Created 5 years ago
Updated 1 day ago
Feedback? Help us improve.