Evaluators for agent trajectories
Top 93.5% on sourcepulse
This library provides evaluators for agent trajectories, helping developers understand and improve the intermediate steps LLM agents take to solve problems. It offers various evaluation methods, including LLM-as-judge and direct trajectory matching, catering to developers building complex agentic applications.
How It Works
AgentEvals offers several evaluation strategies for agent trajectories, which are sequences of messages or graph steps. Trajectory match evaluators compare an agent's output against a reference trajectory using modes like "strict," "unordered," "subset," or "superset." LLM-as-judge evaluators use a language model to score the trajectory's accuracy, efficiency, and logical progression, with options to include reference trajectories or customize prompts. Graph trajectory evaluators specifically handle agents modeled as graphs, assessing sequences of nodes and steps.
Quick Start & Requirements
pip install agentevals
(Python) or npm install agentevals @langchain/core
(TypeScript).OPENAI_API_KEY
). LangChain integrations are used by default, but direct OpenAI client usage is also supported.Highlighted Details
Maintenance & Community
The project is associated with LangChainAI and can be found on X @LangChainAI. Issues and suggestions can be raised on their GitHub repository.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. Compatibility for commercial use or closed-source linking would depend on the final license.
Limitations & Caveats
The README does not specify any limitations or known issues. The LLM-as-judge evaluators rely on external LLM providers, which may introduce variability or cost.
1 week ago
Inactive