scenario  by langwatch

Agent testing framework for simulating user interactions

created 4 months ago
568 stars

Top 56.6% on SourcePulse

GitHubView on GitHub
Project Summary

Scenario is an agent testing framework designed for agentic codebases, enabling users to simulate and evaluate agent behavior in various scenarios. It supports Python, TypeScript, and Go, and integrates with any LLM evaluation framework.

How It Works

Scenario allows users to define simulations with custom assertions, agents (including user simulators and judge agents), and scripts that control conversation flow. It leverages LLMs to generate user messages, evaluate agent responses against defined criteria, and can execute predefined conversational steps or run in an "autopilot" mode guided by a description.

Quick Start & Requirements

  • Install: uv add langwatch-scenario pytest (Python) or pnpm install @langwatch/scenario vitest (TypeScript).
  • Prerequisites: OpenAI API key (OPENAI_API_KEY).
  • Run: pytest -s tests/test_vegetarian_recipe_agent.py (Python) or npx vitest run tests/vegetarian-recipe-agent.test.ts (TypeScript).
  • Docs: 📖 Documentation, 📺 Watch Video Tutorial.

Highlighted Details

  • Supports "autopilot" mode where user messages are automatically generated.
  • Allows full control of conversation flow via custom scripts and assertions.
  • Includes a JudgeAgent for real-time evaluation of agent performance against criteria.
  • Features caching mechanisms (@scenario.cache(), cache_key) for repeatable tests.
  • Offers visualization and debugging capabilities via LangWatch integration.

Maintenance & Community

  • Active development with examples provided for Python and TypeScript.
  • Community support via 💬 Discord Community and 🐛 Issue Tracker.

Licensing & Compatibility

  • MIT License. Compatible with commercial and closed-source applications.

Limitations & Caveats

The framework relies on LLM outputs for simulation and evaluation, which can introduce non-determinism unless caching is effectively utilized. Parallel execution requires specific pytest plugins.

Health Check
Last commit

4 days ago

Responsiveness

1 week

Pull Requests (30d)
13
Issues (30d)
0
Star History
193 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.