judgeval by JudgmentLabs

Agent observability and self-learning toolkit

Created 1 year ago

1,020 stars

Top 36.5% on SourcePulse

1 Expert Loves This Project

benjibc

Cofounder of Fireworks AI

Project Summary

Judgeval provides open-source tooling for tracing, evaluating, and monitoring autonomous, stateful agents, enabling continuous learning and self-improvement. It captures runtime data from agent-environment interactions, targeting developers and researchers building and deploying AI agents.

How It Works

Judgeval integrates via a Python SDK to automatically trace agent execution, capturing inputs, outputs, tool calls, latency, and custom metadata. This data can be exported for analysis, used to build custom evaluators (including LLM-as-a-judge), and trigger alerts for production monitoring. The approach facilitates debugging, performance bottleneck identification, and data-driven agent optimization.

Quick Start & Requirements

Install: pip install judgeval
Prerequisites: JUDGMENT_API_KEY and JUDGMENT_ORG_ID environment variables (or JUDGMENT_API_URL for self-hosted).
Docs: https://docs.judgmentlabs.ai/
Demo: https://www.youtube.com/watch?v=1S4LixpVbcc

Highlighted Details

Automatic tracing for OpenAI, Anthropic, and LangGraph.
Supports LLM-as-a-judge, manual labeling, and code-based evaluators.
Production monitoring with Slack alerts and custom hooks.
Data export to Parquet/S3 for scaled analysis and A/B testing.

Maintenance & Community

Maintained by Judgment Labs. Community channels include Discord (https://discord.gg/tGVFf8UBUY) and X (https://x.com/JudgmentLabs).

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires clarification for commercial use or closed-source linking.

Limitations & Caveats

The license is not specified, which is a significant blocker for determining commercial usability. The core functionality relies on connecting to the Judgment Platform, requiring API keys or a self-hosted instance.

Health Check

Last Commit

18 hours ago

Responsiveness

Inactive

Pull Requests (30d)

36

Issues (30d)

0

Star History

9 stars in the last 30 days

Explore Similar Projects

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

awesome-open-mlops by fuzzylabs

Curated list of open-source MLOps tools

Created 4 years ago

Updated 9 months ago

Starred by

Alex Graveley

Alex Graveley(Creator of GitHub Copilot, Dropbox Paper, Mobilecoin, Hackpad),

Gregor Zunic

Gregor Zunic(Cofounder of Browser Use), and

4 more.

lmnr by lmnr-ai

Open-source platform for engineering AI products

Created 1 year ago

Updated 1 day ago

Starred by

Elie Bursztein

Elie Bursztein(Cybersecurity Lead at Google DeepMind),

Philipp Schmid

Philipp Schmid(DevRel at Google DeepMind), and

2 more.

openlit by openlit

AI engineering platform for LLM observability and more

Created 2 years ago

Updated 19 hours ago

holmesgpt by HolmesGPT

AI agent for alert root cause analysis

Created 1 year ago

Updated 17 hours ago

Starred by

Marc Klingen

Marc Klingen(Cofounder of Langfuse),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

4 more.

langwatch by langwatch

LLM ops platform for traces, analytics, evaluations, datasets, and prompt optimization

Created 2 years ago

Updated 17 hours ago

Starred by

Amin Ahmad

Amin Ahmad(Cofounder of Vectara),

Patrick Kidger

Patrick Kidger(Core Contributor to JAX ecosystem), and

17 more.

aim by aimhubio

Experiment tracker for AI model training runs

Created 6 years ago

Updated 1 day ago

Starred by

Morgan Funtowicz

Morgan Funtowicz(Head of ML Optimizations at Hugging Face),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

8 more.

lighteval by huggingface

LLM evaluation toolkit for multiple backends

Created 2 years ago

Updated 5 days ago

Starred by

Alex Atallah

Alex Atallah(Cofounder of OpenRouter, OpenSea),

Logan Kilpatrick

Logan Kilpatrick(Product Lead on Google AI Studio), and

14 more.

helicone by Helicone

LLM observability platform for monitoring, evaluating, and experimenting

Created 3 years ago

Updated 1 day ago

Starred by

Han Wang

Han Wang(Cofounder of Mintlify),

John Resig

John Resig(Author of jQuery; Chief Software Architect at Khan Academy), and

6 more.

evidently by evidentlyai

Open-source framework for ML/LLM observability

Created 5 years ago

Updated 1 day ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm), and

14 more.

wandb by wandb

AI developer platform for model training, fine-tuning, and management

Created 9 years ago

Updated 19 hours ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

7 more.

opik by comet-ml

Open-source LLM evaluation framework for RAG, agents, and more

Created 2 years ago

Updated 17 hours ago

Starred by

Alexey Milovidov

Alexey Milovidov(Cofounder of Clickhouse),

Marc Klingen

Marc Klingen(Cofounder of Langfuse), and

20 more.

langfuse by langfuse

Open source LLM engineering platform for observability and evals

Created 2 years ago

Updated 18 hours ago

Feedback? Help us improve.