Auto-GPT-Benchmarks  by Significant-Gravitas

Deprecated repo for agent performance benchmarking

created 2 years ago
277 stars

Top 94.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository is deprecated and has been superseded by the main Auto-GPT repository's benchmark folder. It was designed to objectively benchmark the performance of AI agents across categories like code generation, information retrieval, memory management, and safety, aiming to save users time and money through automation.

How It Works

The project aimed to provide automated, objective performance metrics for AI agents. It allowed users to compare different agent setups and implementations based on quantifiable results in key operational areas.

Quick Start & Requirements

This repository is deprecated. For updated benchmarks, refer to the benchmark folder within the main Auto-GPT repository: https://github.com/Significant-Gravitas/AutoGPT.

Highlighted Details

  • Provides objective performance metrics for AI agents.
  • Covers categories such as code generation, retrieval, memory, and safety.
  • Aims to automate benchmarking for time and cost savings.
  • Includes rankings and detailed results for agents like Beebot, mini-agi, and Auto-GPT.

Maintenance & Community

This repository is marked as deprecated. Further development and community engagement are likely focused on the main Auto-GPT repository.

Licensing & Compatibility

The license is not specified in the provided README excerpt. Compatibility for commercial use or closed-source linking would require checking the license of the main Auto-GPT repository.

Limitations & Caveats

The repository is explicitly deprecated, meaning it is no longer actively maintained or updated. Users should consult the successor repository for current benchmarking information and tools.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Simon Willison Simon Willison(Author of Django), and
1 more.

tau-bench by sierra-research

2.4%
709
Benchmark for tool-agent-user interaction research
created 1 year ago
updated 3 weeks ago
Feedback? Help us improve.