agentevals  by langchain-ai

Evaluators for agent trajectories

created 5 months ago
282 stars

Top 93.5% on sourcepulse

GitHubView on GitHub
Project Summary

This library provides evaluators for agent trajectories, helping developers understand and improve the intermediate steps LLM agents take to solve problems. It offers various evaluation methods, including LLM-as-judge and direct trajectory matching, catering to developers building complex agentic applications.

How It Works

AgentEvals offers several evaluation strategies for agent trajectories, which are sequences of messages or graph steps. Trajectory match evaluators compare an agent's output against a reference trajectory using modes like "strict," "unordered," "subset," or "superset." LLM-as-judge evaluators use a language model to score the trajectory's accuracy, efficiency, and logical progression, with options to include reference trajectories or customize prompts. Graph trajectory evaluators specifically handle agents modeled as graphs, assessing sequences of nodes and steps.

Quick Start & Requirements

  • Installation: pip install agentevals (Python) or npm install agentevals @langchain/core (TypeScript).
  • Prerequisites: For LLM-as-judge evaluators, an OpenAI API key is required and should be set as an environment variable (OPENAI_API_KEY). LangChain integrations are used by default, but direct OpenAI client usage is also supported.
  • Demo: The README provides detailed Python and TypeScript examples for various evaluators.

Highlighted Details

  • Supports flexible tool argument matching (exact, ignore, subset, superset, custom overrides).
  • Includes specialized evaluators for graph-based agent trajectories (e.g., from LangGraph).
  • Offers asynchronous support for all evaluators.
  • Integrates with LangSmith for experiment tracking and evaluation logging.

Maintenance & Community

The project is associated with LangChainAI and can be found on X @LangChainAI. Issues and suggestions can be raised on their GitHub repository.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking would depend on the final license.

Limitations & Caveats

The README does not specify any limitations or known issues. The LLM-as-judge evaluators rely on external LLM providers, which may introduce variability or cost.

Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
4
Star History
102 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Steven Hao Steven Hao(Cofounder of Cognition), and
6 more.

openai-agents-python by openai

1.5%
13k
Python SDK for multi-agent workflows
created 4 months ago
updated 7 hours ago
Feedback? Help us improve.