judgeval  by JudgmentLabs

Agent observability and self-learning toolkit

Created 10 months ago
1,008 stars

Top 37.1% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Judgeval provides open-source tooling for tracing, evaluating, and monitoring autonomous, stateful agents, enabling continuous learning and self-improvement. It captures runtime data from agent-environment interactions, targeting developers and researchers building and deploying AI agents.

How It Works

Judgeval integrates via a Python SDK to automatically trace agent execution, capturing inputs, outputs, tool calls, latency, and custom metadata. This data can be exported for analysis, used to build custom evaluators (including LLM-as-a-judge), and trigger alerts for production monitoring. The approach facilitates debugging, performance bottleneck identification, and data-driven agent optimization.

Quick Start & Requirements

Highlighted Details

  • Automatic tracing for OpenAI, Anthropic, and LangGraph.
  • Supports LLM-as-a-judge, manual labeling, and code-based evaluators.
  • Production monitoring with Slack alerts and custom hooks.
  • Data export to Parquet/S3 for scaled analysis and A/B testing.

Maintenance & Community

Maintained by Judgment Labs. Community channels include Discord (https://discord.gg/tGVFf8UBUY) and X (https://x.com/JudgmentLabs).

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires clarification for commercial use or closed-source linking.

Limitations & Caveats

The license is not specified, which is a significant blocker for determining commercial usability. The core functionality relies on connecting to the Judgment Platform, requiring API keys or a self-hosted instance.

Health Check
Last Commit

18 hours ago

Responsiveness

Inactive

Pull Requests (30d)
47
Issues (30d)
0
Star History
229 stars in the last 30 days

Explore Similar Projects

Starred by Alex Graveley Alex Graveley(Creator of GitHub Copilot, Dropbox Paper, Mobilecoin, Hackpad), Gregor Zunic Gregor Zunic(Cofounder of Browser Use), and
4 more.

lmnr by lmnr-ai

0.9%
2k
Open-source platform for engineering AI products
Created 1 year ago
Updated 1 day ago
Starred by Morgan Funtowicz Morgan Funtowicz(Head of ML Optimizations at Hugging Face), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
7 more.

lighteval by huggingface

2.6%
2k
LLM evaluation toolkit for multiple backends
Created 1 year ago
Updated 1 day ago
Starred by Han Wang Han Wang(Cofounder of Mintlify), John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), and
6 more.

evidently by evidentlyai

0.3%
7k
Open-source framework for ML/LLM observability
Created 4 years ago
Updated 15 hours ago
Starred by Luis Capelo Luis Capelo(Cofounder of Lightning AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

opik by comet-ml

1.7%
14k
Open-source LLM evaluation framework for RAG, agents, and more
Created 2 years ago
Updated 14 hours ago
Feedback? Help us improve.