agent-evaluation  by awslabs

Framework for testing generative AI virtual agents

Created 1 year ago
291 stars

Top 90.6% on SourcePulse

GitHubView on GitHub
Project Summary

This framework provides a generative AI-powered system for testing virtual agents, specifically targeting those built with AWS services like Amazon Bedrock, Amazon Q Business, and Amazon SageMaker. It enables automated, multi-turn conversational testing and evaluation, aiming to expedite delivery and maintain agent stability within CI/CD pipelines.

How It Works

The core of the framework is an LLM-based evaluator agent that orchestrates conversations with a target agent. It evaluates responses during these multi-turn dialogues, offering built-in support for popular AWS AI services and allowing integration of custom agents. Hooks can be defined for additional tasks like integration testing.

Quick Start & Requirements

Highlighted Details

  • Supports Amazon Bedrock, Amazon Q Business, and Amazon SageMaker.
  • Enables concurrent, multi-turn conversations.
  • Facilitates integration into CI/CD pipelines.
  • Allows custom agent integration and hook definitions.

Maintenance & Community

  • Contributions are welcomed via CONTRIBUTING.md.

Licensing & Compatibility

  • Apache-2.0 License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

The framework is primarily designed for AWS-integrated agents, and while custom agents can be brought in, the core tooling is heavily oriented towards the AWS ecosystem.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
2
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Andrew Ng Andrew Ng(Founder of DeepLearning.AI; Cofounder of Coursera; Professor at Stanford), Thomas Wolf Thomas Wolf(Cofounder of Hugging Face), and
4 more.

ag2 by ag2ai

0.9%
4k
AgentOS for building AI agents and facilitating multi-agent cooperation
Created 10 months ago
Updated 2 days ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

SuperAGI by TransformerOptimus

0.1%
17k
Open-source framework for autonomous AI agent development
Created 2 years ago
Updated 7 months ago
Feedback? Help us improve.