rhesis  by rhesis-ai

AI-powered testing platform and SDK for LLM and agentic applications

Created 1 year ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

Open-source testing platform and SDK for LLM and agentic applications, Rhesis addresses the challenges of non-deterministic outputs and unpredictable inputs. It empowers cross-functional teams by automatically generating hundreds of test scenarios from plain-language requirements, executing them against applications, and evaluating results with AI, thereby identifying critical failures before production deployment.

How It Works

The platform allows users to define application behavior and constraints in plain English, accessible via a UI or SDK. Rhesis then leverages AI to generate a diverse suite of test inputs, including adversarial prompts and edge cases, designed to probe these requirements. These tests, supporting both single-turn and multi-turn interactions, are executed against the target application. Finally, LLM-based evaluation scores the outputs against the defined rules, providing actionable insights into where the application breaks. This automated approach generates comprehensive test coverage for complex LLM behaviors that traditional testing methods struggle to address.

Quick Start & Requirements

  • Installation: Options include a hosted version (app.rhesis.ai with a free tier), a Python SDK (pip install rhesis-sdk), or a zero-configuration local Docker setup (git clone https://github.com/rhesis-ai/rhesis.git && cd rhesis && ./rh start).
  • Prerequisites: Python 3.x (for SDK), Docker (for local deployment), and a RHESIS_API_KEY for AI test generation.
  • Setup Time: Local Docker deployment completes in under 5 minutes.
  • Links: Hosted App: https://app.rhesis.ai/, GitHub: https://github.com/rhesis-ai/rhesis.

Highlighted Details

  • Supports testing of both simple Q&A and complex multi-turn conversations via the 'Penelope' agent.
  • Features a collaborative UI for non-technical stakeholders (legal, compliance) and an SDK/API for engineers, integrating into CI/CD pipelines.
  • AI-driven test generation from natural language requirements, creating adversarial prompts and edge cases.
  • Includes LLM-based evaluation and a library of pre-built metrics, with integrations for RAGAS and DeepEval.

Maintenance & Community

Contributions are welcomed, with community support available via Discord and GitHub Discussions. Enterprise features are planned for 2026.

Licensing & Compatibility

The Community Edition is MIT licensed, permitting free commercial use. Enterprise features are planned for future development.

Limitations & Caveats

Telemetry is collected by default for both cloud and self-hosted instances, though it can be disabled in self-hosted deployments. Enterprise features are not yet available.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
44
Issues (30d)
10
Star History
26 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.