agent-evaluation  by awslabs

Framework for testing generative AI virtual agents

Created 2 years ago
364 stars

Top 77.2% on SourcePulse

GitHubView on GitHub
Project Summary

This framework provides a generative AI-powered system for testing virtual agents, specifically targeting those built with AWS services like Amazon Bedrock, Amazon Q Business, and Amazon SageMaker. It enables automated, multi-turn conversational testing and evaluation, aiming to expedite delivery and maintain agent stability within CI/CD pipelines.

How It Works

The core of the framework is an LLM-based evaluator agent that orchestrates conversations with a target agent. It evaluates responses during these multi-turn dialogues, offering built-in support for popular AWS AI services and allowing integration of custom agents. Hooks can be defined for additional tasks like integration testing.

Quick Start & Requirements

Highlighted Details

  • Supports Amazon Bedrock, Amazon Q Business, and Amazon SageMaker.
  • Enables concurrent, multi-turn conversations.
  • Facilitates integration into CI/CD pipelines.
  • Allows custom agent integration and hook definitions.

Maintenance & Community

  • Contributions are welcomed via CONTRIBUTING.md.

Licensing & Compatibility

  • Apache-2.0 License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

The framework is primarily designed for AWS-integrated agents, and while custom agents can be brought in, the core tooling is heavily oriented towards the AWS ecosystem.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Andrew Ng Andrew Ng(Founder of DeepLearning.AI; Cofounder of Coursera; Professor at Stanford), Jack Lukic Jack Lukic(Author of Semantic UI), and
5 more.

ag2 by ag2ai

0.6%
5k
AgentOS for building AI agents and facilitating multi-agent cooperation
Created 1 year ago
Updated 22 hours ago
Starred by Lilian Weng Lilian Weng(Cofounder of Thinking Machines Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
59 more.

AutoGPT by Significant-Gravitas

0.1%
185k
AI agent platform for building, deploying, and running autonomous workflows
Created 3 years ago
Updated 11 hours ago
Feedback? Help us improve.