rhesis by rhesis-ai

AI-powered testing platform and SDK for LLM and agentic applications

Created 1 year ago

283 stars

Top 92.5% on SourcePulse

Project Summary

Open-source testing platform and SDK for LLM and agentic applications, Rhesis addresses the challenges of non-deterministic outputs and unpredictable inputs. It empowers cross-functional teams by automatically generating hundreds of test scenarios from plain-language requirements, executing them against applications, and evaluating results with AI, thereby identifying critical failures before production deployment.

How It Works

The platform allows users to define application behavior and constraints in plain English, accessible via a UI or SDK. Rhesis then leverages AI to generate a diverse suite of test inputs, including adversarial prompts and edge cases, designed to probe these requirements. These tests, supporting both single-turn and multi-turn interactions, are executed against the target application. Finally, LLM-based evaluation scores the outputs against the defined rules, providing actionable insights into where the application breaks. This automated approach generates comprehensive test coverage for complex LLM behaviors that traditional testing methods struggle to address.

Quick Start & Requirements

Installation: Options include a hosted version (app.rhesis.ai with a free tier), a Python SDK (pip install rhesis-sdk), or a zero-configuration local Docker setup (git clone https://github.com/rhesis-ai/rhesis.git && cd rhesis && ./rh start).
Prerequisites: Python 3.x (for SDK), Docker (for local deployment), and a RHESIS_API_KEY for AI test generation.
Setup Time: Local Docker deployment completes in under 5 minutes.
Links: Hosted App: https://app.rhesis.ai/, GitHub: https://github.com/rhesis-ai/rhesis.

Highlighted Details

Supports testing of both simple Q&A and complex multi-turn conversations via the 'Penelope' agent.
Features a collaborative UI for non-technical stakeholders (legal, compliance) and an SDK/API for engineers, integrating into CI/CD pipelines.
AI-driven test generation from natural language requirements, creating adversarial prompts and edge cases.
Includes LLM-based evaluation and a library of pre-built metrics, with integrations for RAGAS and DeepEval.

Maintenance & Community

Contributions are welcomed, with community support available via Discord and GitHub Discussions. Enterprise features are planned for 2026.

Licensing & Compatibility

The Community Edition is MIT licensed, permitting free commercial use. Enterprise features are planned for future development.

Limitations & Caveats

Telemetry is collected by default for both cloud and self-hosted instances, though it can be disabled in self-hosted deployments. Enterprise features are not yet available.

rhesis by rhesis-ai

Explore Similar Projects

riteway by paralleldrive

agent-evaluation by awslabs

arbigent by takahirom

Devyan by theyashwanthsai

pentest-copilot by bugbasesecurity

full-stack-fastapi-nextjs-llm-template by vstorm-co

testzeus-hercules by test-zeus-ai

testhub_platform by chenjigang4167

browser-agent by magnitudedev

shortest by antiwork

giskard-oss by Giskard-AI

coze-loop by coze-dev