Discover and explore top open-source AI tools and projects—updated daily.
Framework for evaluating conversational agents in dual-control environments
Top 88.6% on SourcePulse
τ²-Bench is a simulation framework designed for evaluating customer service conversational agents across various domains like airline, retail, and telecom. It provides a dual-control environment where both the agent and a user simulator interact, allowing for rigorous performance assessment. This framework benefits agent developers by offering a standardized method to test and benchmark their agents' capabilities in realistic, simulated scenarios.
How It Works
τ²-Bench operates by defining specific policies, tools, and tasks for each domain. An orchestrator manages the conversation flow, passing messages between the agent, a user simulator, and the environment. The agent can utilize a set of provided tools to interact with the environment, while the user simulator mimics real user behavior. This setup enables the evaluation of agent performance based on adherence to policies and task completion success.
Quick Start & Requirements
pip install -e .
(after cloning the repository).env
file.tau2 check-data
to ensure data directory setup.tau2 domain <domain>
and visiting http://127.0.0.1:8004/redoc
.Highlighted Details
agent-llm
solo or with an oracle plan (llm_agent_gt
).LLM_CACHE_ENABLED
to True
.Maintenance & Community
https://github.com/sierra-research/tau2-bench
.Licensing & Compatibility
Limitations & Caveats
3 weeks ago
Inactive