agent-evaluation  by awslabs

Framework for testing generative AI virtual agents

Created 1 year ago
304 stars

Top 87.9% on SourcePulse

GitHubView on GitHub
Project Summary

This framework provides a generative AI-powered system for testing virtual agents, specifically targeting those built with AWS services like Amazon Bedrock, Amazon Q Business, and Amazon SageMaker. It enables automated, multi-turn conversational testing and evaluation, aiming to expedite delivery and maintain agent stability within CI/CD pipelines.

How It Works

The core of the framework is an LLM-based evaluator agent that orchestrates conversations with a target agent. It evaluates responses during these multi-turn dialogues, offering built-in support for popular AWS AI services and allowing integration of custom agents. Hooks can be defined for additional tasks like integration testing.

Quick Start & Requirements

Highlighted Details

  • Supports Amazon Bedrock, Amazon Q Business, and Amazon SageMaker.
  • Enables concurrent, multi-turn conversations.
  • Facilitates integration into CI/CD pipelines.
  • Allows custom agent integration and hook definitions.

Maintenance & Community

  • Contributions are welcomed via CONTRIBUTING.md.

Licensing & Compatibility

  • Apache-2.0 License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

The framework is primarily designed for AWS-integrated agents, and while custom agents can be brought in, the core tooling is heavily oriented towards the AWS ecosystem.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
2
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Starred by Andrew Ng Andrew Ng(Founder of DeepLearning.AI; Cofounder of Coursera; Professor at Stanford), Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), and
4 more.

ag2 by ag2ai

1.1%
4k
AgentOS for building AI agents and facilitating multi-agent cooperation
Created 11 months ago
Updated 13 hours ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

SuperAGI by TransformerOptimus

0.1%
17k
Open-source framework for autonomous AI agent development
Created 2 years ago
Updated 9 months ago
Feedback? Help us improve.