Agent testing framework for simulating user interactions
Top 56.6% on SourcePulse
Scenario is an agent testing framework designed for agentic codebases, enabling users to simulate and evaluate agent behavior in various scenarios. It supports Python, TypeScript, and Go, and integrates with any LLM evaluation framework.
How It Works
Scenario allows users to define simulations with custom assertions, agents (including user simulators and judge agents), and scripts that control conversation flow. It leverages LLMs to generate user messages, evaluate agent responses against defined criteria, and can execute predefined conversational steps or run in an "autopilot" mode guided by a description.
Quick Start & Requirements
uv add langwatch-scenario pytest
(Python) or pnpm install @langwatch/scenario vitest
(TypeScript).OPENAI_API_KEY
).pytest -s tests/test_vegetarian_recipe_agent.py
(Python) or npx vitest run tests/vegetarian-recipe-agent.test.ts
(TypeScript).Highlighted Details
JudgeAgent
for real-time evaluation of agent performance against criteria.@scenario.cache()
, cache_key
) for repeatable tests.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The framework relies on LLM outputs for simulation and evaluation, which can introduce non-determinism unless caching is effectively utilized. Parallel execution requires specific pytest plugins.
4 days ago
1 week