rogue by qualifire-dev

AI agent evaluation framework

Created 8 months ago

1,008 stars

Top 36.8% on SourcePulse

Project Summary

Rogue simplifies AI agent evaluation for performance, compliance, and reliability. It targets developers and researchers, ensuring agents behave as intended via automated scenario generation and detailed reporting.

How It Works

Rogue employs a client-server architecture where a central server manages evaluation logic, accessible via multiple clients: a Terminal UI (TUI), a Web UI (Gradio-based), and a Command-Line Interface (CLI). It utilizes Google's A2A protocol, pitting a dynamic EvaluatorAgent against the agent under test. The system leverages LLMs via LiteLLM for generating test scenarios from high-level business context and for analyzing evaluation results, providing a flexible and powerful approach to agent assessment.

Quick Start & Requirements

Installation is streamlined via uvx rogue-ai for TUI, Web UI, or CLI modes. Alternatively, clone the repository and use uv sync or pip install -e .. Prerequisites include uvx (or uv), Python 3.10+, and an API key for a supported LLM provider (e.g., OpenAI, Google, Anthropic). An example T-Shirt Store agent is provided for immediate testing with uvx rogue-ai --example=tshirt_store. Further details and community support are available via their Discord community.

Highlighted Details

Dynamic Scenario Generation: Creates comprehensive test suites automatically from high-level business context.
Live Evaluation Monitoring: Real-time chat interface to observe interactions between the Evaluator and the agent.
Comprehensive Reporting: Generates detailed Markdown reports summarizing performance, pass/fail rates, and findings.
Multi-Faceted Testing: Natively supports policy compliance testing, with a flexible framework for expansion (e.g., prompt injection, safety).
Broad Model Support: Integrates with numerous LLMs via LiteLLM, including OpenAI, Google Gemini, and Anthropic.
User-Friendly Interface: A guided Gradio Web UI simplifies configuration and execution.

Maintenance & Community

The project encourages contributions via standard GitHub pull requests. A Discord community is available for support and discussion.

Licensing & Compatibility

The project is available under a license that permits free, perpetual use but explicitly prohibits hosting and selling the software. For specific commercial use queries, contact admin@qualifire.ai. This restriction may impact deployment in commercial SaaS offerings.

Limitations & Caveats

The license imposes restrictions on commercial hosting and reselling. Users must provide their own LLM API keys for evaluation and generation tasks. The CLI mode requires the Rogue server to be running concurrently.

rogue by qualifire-dev

Explore Similar Projects

JoinAI-Agent by opencmit

agent-evaluation by awslabs

meta-agents-research-environments by facebookresearch

MCP-Universe by SalesforceAIResearch

openbench by groq

AgentSims by py499372727

testzeus-hercules by test-zeus-ai

testhub_platform by chenjigang4167

agentops by AgentOps-AI

coze-loop by coze-dev

hive by aden-hive

AutoGPT by Significant-Gravitas