Discover and explore top open-source AI tools and projects—updated daily.
qualifire-devAI agent evaluation framework
Top 73.2% on SourcePulse
Rogue simplifies AI agent evaluation for performance, compliance, and reliability. It targets developers and researchers, ensuring agents behave as intended via automated scenario generation and detailed reporting.
How It Works
Rogue employs a client-server architecture where a central server manages evaluation logic, accessible via multiple clients: a Terminal UI (TUI), a Web UI (Gradio-based), and a Command-Line Interface (CLI). It utilizes Google's A2A protocol, pitting a dynamic EvaluatorAgent against the agent under test. The system leverages LLMs via LiteLLM for generating test scenarios from high-level business context and for analyzing evaluation results, providing a flexible and powerful approach to agent assessment.
Quick Start & Requirements
Installation is streamlined via uvx rogue-ai for TUI, Web UI, or CLI modes. Alternatively, clone the repository and use uv sync or pip install -e .. Prerequisites include uvx (or uv), Python 3.10+, and an API key for a supported LLM provider (e.g., OpenAI, Google, Anthropic). An example T-Shirt Store agent is provided for immediate testing with uvx rogue-ai --example=tshirt_store. Further details and community support are available via their Discord community.
Highlighted Details
Maintenance & Community
The project encourages contributions via standard GitHub pull requests. A Discord community is available for support and discussion.
Licensing & Compatibility
The project is available under a license that permits free, perpetual use but explicitly prohibits hosting and selling the software. For specific commercial use queries, contact admin@qualifire.ai. This restriction may impact deployment in commercial SaaS offerings.
Limitations & Caveats
The license imposes restrictions on commercial hosting and reselling. Users must provide their own LLM API keys for evaluation and generation tasks. The CLI mode requires the Rogue server to be running concurrently.
1 day ago
Inactive
SalesforceAIResearch
groq
py499372727
Significant-Gravitas