meta-agents-research-environments by facebookresearch

Platform for evaluating AI agents in dynamic, realistic scenarios

Created 6 months ago

447 stars

Top 67.3% on SourcePulse

View on GitHub

2 Experts Love This Project

Vincent Weisser

Cofounder of Prime Intellect

Lewis Tunstall

Research Engineer at Hugging Face

Project Summary

Summary

Meta Agents Research Environments (ARE) evaluates AI agents in dynamic, realistic scenarios, overcoming static benchmark limitations. It introduces evolving environments requiring multi-step reasoning and adaptation, targeting AI researchers. ARE offers a comprehensive evaluation framework, notably the Gaia2 benchmark, to better mirror real-world challenges and assess agent capabilities.

How It Works

ARE employs dynamic environments where scenarios evolve with new information, demanding agent adaptation. It supports multi-step reasoning (10+ steps, minutes duration) and grounds agents in simulated applications (email, file systems) via APIs using the ReAct framework. Dynamic events add complexity, orchestrated by scenarios for comprehensive task execution. This approach provides a more challenging and realistic evaluation than traditional static benchmarks.

Quick Start & Requirements

Installation is streamlined via uvx --from meta-agents-research-environments are-benchmark gaia2-run --hf meta-agents-research-environments/gaia2 --hf_split validation -l 1 or pip install meta-agents-research-environments. A prerequisite is the uv Python package installer. ARE supports various AI model providers, requiring API keys or local endpoint configurations. A no-installation demo is available on Hugging Face. Official documentation and tutorials are linked within the README.

Highlighted Details

Dynamic Environments: Scenarios evolve with new information, demanding agent adaptation.
Multi-Step Reasoning: Supports complex tasks requiring over 10 steps and significant execution time.
Gaia2 Benchmark: Features 800 dynamic scenarios across 10 universes for rigorous agent evaluation.
Interactive GUI: Web-based interface for direct agent interaction (Playground Mode) and structured task evaluation (Scenarios Mode).
Model Agnostic: Integrates with multiple LLM providers via LiteLLM (API/local).

Maintenance & Community

The project lists over 20 authors in its research paper citation, suggesting active development. However, the README does not provide direct links to community channels (e.g., Discord, Slack) or a public roadmap.

Licensing & Compatibility

Licensed under the permissive MIT License, this project generally allows broad compatibility with commercial and closed-source applications, with minimal restrictions beyond attribution.

Limitations & Caveats

As a research platform, ARE may undergo ongoing development. Specific performance benchmarks or known limitations (e.g., unsupported platforms, critical bugs) are not explicitly detailed. Setting up advanced agent configurations with specific LLMs may require significant technical expertise and resource allocation.

Health Check

Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

17 stars in the last 30 days