agent-evaluation by awslabs

Framework for testing generative AI virtual agents

Created 1 year ago

341 stars

Top 81.4% on SourcePulse

Project Summary

This framework provides a generative AI-powered system for testing virtual agents, specifically targeting those built with AWS services like Amazon Bedrock, Amazon Q Business, and Amazon SageMaker. It enables automated, multi-turn conversational testing and evaluation, aiming to expedite delivery and maintain agent stability within CI/CD pipelines.

How It Works

The core of the framework is an LLM-based evaluator agent that orchestrates conversations with a target agent. It evaluates responses during these multi-turn dialogues, offering built-in support for popular AWS AI services and allowing integration of custom agents. Hooks can be defined for additional tasks like integration testing.

Quick Start & Requirements

Install: pip install agent-evaluation
Prerequisites: Python 3.8+, AWS SDK (Boto3) configured with credentials for target AWS services.
Documentation: https://github.com/awslabs/agent-evaluation/blob/main/docs/index.md

Highlighted Details

Supports Amazon Bedrock, Amazon Q Business, and Amazon SageMaker.
Enables concurrent, multi-turn conversations.
Facilitates integration into CI/CD pipelines.
Allows custom agent integration and hook definitions.

Maintenance & Community

Contributions are welcomed via CONTRIBUTING.md.

Licensing & Compatibility

Apache-2.0 License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

The framework is primarily designed for AWS-integrated agents, and while custom agents can be brought in, the core tooling is heavily oriented towards the AWS ecosystem.

agent-evaluation by awslabs

Explore Similar Projects

relay by AgentWorkforce

takt by nrslib

claude-code-sub-agent-collective by vanzan01

jido by agentjido

AIShell by PowerShell

cooragent by LeapLabTHU

AutoGroq by jgravelle

ag2 by ag2ai

agency-swarm by VRSEN

SuperAGI by TransformerOptimus

hive by aden-hive

AutoGPT by Significant-Gravitas