Discover and explore top open-source AI tools and projects—updated daily.
canwhiteAgent evaluation and debugging toolkit
New!
Top 90.3% on SourcePulse
A transparent HTTP proxy for evaluating AI agents, AgentEval captures and structures agent-LLM API traffic. It automates conversation splitting, multi-dimensional grading, rule-based behavioral diagnosis, and LLM-driven configuration probing, offering insights via a web dashboard. This tool is designed for developers and researchers seeking to objectively assess and debug AI agent performance.
How It Works
AgentEval acts as an HTTP proxy, intercepting and logging all agent-LLM API communications. It automatically detects session boundaries using message rollback or idle timeouts, generating structured conversation views. The system then applies automated grading across four dimensions (task completion, tool efficiency, response quality, performance) using rule-based metrics and an LLM judge. Behavioral issues are diagnosed via a 10-rule engine, and an LLM probe, equipped with file access tools, analyzes the agent's source configuration for root causes. Results are presented through a local web dashboard.
Quick Start & Requirements
cargo run..env file for AGENTEVAL_UPSTREAM (LLM API), AGENTEVAL_PORT, AGENTEVAL_JUDGE_API_BASE, AGENTEVAL_JUDGE_MODEL, AGENTEVAL_JUDGE_API_KEY, and PROBE_SOURCE_PROJECT_DIR.BASE_URL to the AgentEval proxy address (e.g., http://127.0.0.1:57633).cargo), LLM API access for evaluation features.http://127.0.0.1:57633/dashboard/.Highlighted Details
Maintenance & Community
The provided README does not detail community channels, contributors, sponsorships, or a roadmap.
Licensing & Compatibility
The software's license is not specified in the README, making it impossible to determine compatibility for commercial use or closed-source linking.
Limitations & Caveats
The probe feature includes safety mechanisms like path sandboxing and read-only tools to prevent unintended modifications. LLM-based grading and diagnosis summarization are best-effort and may be skipped if LLM API access is unavailable. The probe requires explicit configuration of the agent's source project directory.
3 weeks ago
Inactive