Discover and explore top open-source AI tools and projects—updated daily.
sierra-researchFramework for evaluating conversational agents in dual-control environments
Top 74.6% on SourcePulse
τ²-Bench is a simulation framework designed for evaluating customer service conversational agents across various domains like airline, retail, and telecom. It provides a dual-control environment where both the agent and a user simulator interact, allowing for rigorous performance assessment. This framework benefits agent developers by offering a standardized method to test and benchmark their agents' capabilities in realistic, simulated scenarios.
How It Works
τ²-Bench operates by defining specific policies, tools, and tasks for each domain. An orchestrator manages the conversation flow, passing messages between the agent, a user simulator, and the environment. The agent can utilize a set of provided tools to interact with the environment, while the user simulator mimics real user behavior. This setup enables the evaluation of agent performance based on adherence to policies and task completion success.
Quick Start & Requirements
pip install -e . (after cloning the repository).env file.tau2 check-data to ensure data directory setup.tau2 domain <domain> and visiting http://127.0.0.1:8004/redoc.Highlighted Details
agent-llm solo or with an oracle plan (llm_agent_gt).LLM_CACHE_ENABLED to True.Maintenance & Community
https://github.com/sierra-research/tau2-bench.Licensing & Compatibility
Limitations & Caveats
1 week ago
Inactive
py499372727
langroid
OpenBMB
microsoft
langchain-ai