Framework for LLM agent benchmark environments
Top 78.8% on sourcepulse
CRAB (Cross-environment Agent Benchmark) is a Python-centric framework for building and evaluating multimodal language model agents across diverse environments. It targets researchers and developers building embodied AI agents, offering a unified interface for agents to interact with multiple environments simultaneously, thereby simplifying complex benchmarking scenarios.
How It Works
CRAB enables the creation of agent environments deployable across various platforms (in-memory, Docker, VMs, distributed machines) via Python functions. New agent actions are defined using a simple @action
decorator, and environments are constructed by integrating these actions. The framework introduces a novel graph-based evaluation method for fine-grained performance metrics.
Quick Start & Requirements
pip install crab-framework[client]
export OPENAI_API_KEY=<your api key>
followed by python examples/single_env.py
or python examples/multi_env.py
.Highlighted Details
@action
decorator.Maintenance & Community
The project is associated with CAMEL-AI. Further community and roadmap details are not explicitly provided in the README.
Licensing & Compatibility
The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.
Limitations & Caveats
The project appears to be research-oriented, with specific examples requiring an OpenAI API key. The absence of a stated license may pose a barrier to commercial adoption.
4 weeks ago
1 week