agent-evaluation  by awslabs

Framework for testing generative AI virtual agents

Created 1 year ago
341 stars

Top 81.4% on SourcePulse

GitHubView on GitHub
Project Summary

This framework provides a generative AI-powered system for testing virtual agents, specifically targeting those built with AWS services like Amazon Bedrock, Amazon Q Business, and Amazon SageMaker. It enables automated, multi-turn conversational testing and evaluation, aiming to expedite delivery and maintain agent stability within CI/CD pipelines.

How It Works

The core of the framework is an LLM-based evaluator agent that orchestrates conversations with a target agent. It evaluates responses during these multi-turn dialogues, offering built-in support for popular AWS AI services and allowing integration of custom agents. Hooks can be defined for additional tasks like integration testing.

Quick Start & Requirements

Highlighted Details

  • Supports Amazon Bedrock, Amazon Q Business, and Amazon SageMaker.
  • Enables concurrent, multi-turn conversations.
  • Facilitates integration into CI/CD pipelines.
  • Allows custom agent integration and hook definitions.

Maintenance & Community

  • Contributions are welcomed via CONTRIBUTING.md.

Licensing & Compatibility

  • Apache-2.0 License. Permissive for commercial use and closed-source linking.

Limitations & Caveats

The framework is primarily designed for AWS-integrated agents, and while custom agents can be brought in, the core tooling is heavily oriented towards the AWS ecosystem.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
0
Star History
8 stars in the last 30 days

Explore Similar Projects

Starred by Andrew Ng Andrew Ng(Founder of DeepLearning.AI; Cofounder of Coursera; Professor at Stanford), Jack Lukic Jack Lukic(Author of Semantic UI), and
5 more.

ag2 by ag2ai

0.5%
4k
AgentOS for building AI agents and facilitating multi-agent cooperation
Created 1 year ago
Updated 1 day ago
Starred by Elie Bursztein Elie Bursztein(Cybersecurity Lead at Google DeepMind), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

SuperAGI by TransformerOptimus

0.1%
17k
Open-source framework for autonomous AI agent development
Created 2 years ago
Updated 1 year ago
Starred by Lilian Weng Lilian Weng(Cofounder of Thinking Machines Lab), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
59 more.

AutoGPT by Significant-Gravitas

0.1%
182k
AI agent platform for building, deploying, and running autonomous workflows
Created 2 years ago
Updated 20 hours ago
Feedback? Help us improve.