scenario by langwatch

Agent testing framework for simulating user interactions

Created 8 months ago

652 stars

Top 51.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Gabriel Almeida

Cofounder of Langflow

Project Summary

Scenario is an agent testing framework designed for agentic codebases, enabling users to simulate and evaluate agent behavior in various scenarios. It supports Python, TypeScript, and Go, and integrates with any LLM evaluation framework.

How It Works

Scenario allows users to define simulations with custom assertions, agents (including user simulators and judge agents), and scripts that control conversation flow. It leverages LLMs to generate user messages, evaluate agent responses against defined criteria, and can execute predefined conversational steps or run in an "autopilot" mode guided by a description.

Quick Start & Requirements

Install: uv add langwatch-scenario pytest (Python) or pnpm install @langwatch/scenario vitest (TypeScript).
Prerequisites: OpenAI API key (OPENAI_API_KEY).
Run: pytest -s tests/test_vegetarian_recipe_agent.py (Python) or npx vitest run tests/vegetarian-recipe-agent.test.ts (TypeScript).
Docs: 📖 Documentation, 📺 Watch Video Tutorial.

Highlighted Details

Supports "autopilot" mode where user messages are automatically generated.
Allows full control of conversation flow via custom scripts and assertions.
Includes a JudgeAgent for real-time evaluation of agent performance against criteria.
Features caching mechanisms (@scenario.cache(), cache_key) for repeatable tests.
Offers visualization and debugging capabilities via LangWatch integration.

Maintenance & Community

Active development with examples provided for Python and TypeScript.
Community support via 💬 Discord Community and 🐛 Issue Tracker.

Licensing & Compatibility

MIT License. Compatible with commercial and closed-source applications.

Limitations & Caveats

The framework relies on LLM outputs for simulation and evaluation, which can introduce non-determinism unless caching is effectively utilized. Parallel execution requires specific pytest plugins.

scenario by langwatch

Explore Similar Projects

agent-evaluation by awslabs

alumnium by alumnium-hq

openevals by langchain-ai

Test-Agent by codefuse-ai

web-eval-agent by withRefresh

agentic-misalignment by anthropic-experimental

testzeus-hercules by test-zeus-ai

qxf2-page-object-model by qxf2

shortest by antiwork

coze-loop by coze-dev

promptfoo by promptfoo

langfuse by langfuse