rogue  by qualifire-dev

AI agent evaluation framework

Created 5 months ago
392 stars

Top 73.2% on SourcePulse

GitHubView on GitHub
Project Summary

Rogue simplifies AI agent evaluation for performance, compliance, and reliability. It targets developers and researchers, ensuring agents behave as intended via automated scenario generation and detailed reporting.

How It Works

Rogue employs a client-server architecture where a central server manages evaluation logic, accessible via multiple clients: a Terminal UI (TUI), a Web UI (Gradio-based), and a Command-Line Interface (CLI). It utilizes Google's A2A protocol, pitting a dynamic EvaluatorAgent against the agent under test. The system leverages LLMs via LiteLLM for generating test scenarios from high-level business context and for analyzing evaluation results, providing a flexible and powerful approach to agent assessment.

Quick Start & Requirements

Installation is streamlined via uvx rogue-ai for TUI, Web UI, or CLI modes. Alternatively, clone the repository and use uv sync or pip install -e .. Prerequisites include uvx (or uv), Python 3.10+, and an API key for a supported LLM provider (e.g., OpenAI, Google, Anthropic). An example T-Shirt Store agent is provided for immediate testing with uvx rogue-ai --example=tshirt_store. Further details and community support are available via their Discord community.

Highlighted Details

  • Dynamic Scenario Generation: Creates comprehensive test suites automatically from high-level business context.
  • Live Evaluation Monitoring: Real-time chat interface to observe interactions between the Evaluator and the agent.
  • Comprehensive Reporting: Generates detailed Markdown reports summarizing performance, pass/fail rates, and findings.
  • Multi-Faceted Testing: Natively supports policy compliance testing, with a flexible framework for expansion (e.g., prompt injection, safety).
  • Broad Model Support: Integrates with numerous LLMs via LiteLLM, including OpenAI, Google Gemini, and Anthropic.
  • User-Friendly Interface: A guided Gradio Web UI simplifies configuration and execution.

Maintenance & Community

The project encourages contributions via standard GitHub pull requests. A Discord community is available for support and discussion.

Licensing & Compatibility

The project is available under a license that permits free, perpetual use but explicitly prohibits hosting and selling the software. For specific commercial use queries, contact admin@qualifire.ai. This restriction may impact deployment in commercial SaaS offerings.

Limitations & Caveats

The license imposes restrictions on commercial hosting and reselling. Users must provide their own LLM API keys for evaluation and generation tasks. The CLI mode requires the Rogue server to be running concurrently.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
37
Issues (30d)
36
Star History
394 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.