crab by camel-ai

Framework for LLM agent benchmark environments

Created 1 year ago

389 stars

Top 73.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Travis Fischer

Founder of Agentic

Project Summary

CRAB (Cross-environment Agent Benchmark) is a Python-centric framework for building and evaluating multimodal language model agents across diverse environments. It targets researchers and developers building embodied AI agents, offering a unified interface for agents to interact with multiple environments simultaneously, thereby simplifying complex benchmarking scenarios.

How It Works

CRAB enables the creation of agent environments deployable across various platforms (in-memory, Docker, VMs, distributed machines) via Python functions. New agent actions are defined using a simple @action decorator, and environments are constructed by integrating these actions. The framework introduces a novel graph-based evaluation method for fine-grained performance metrics.

Quick Start & Requirements

Install: pip install crab-framework[client]
Prerequisites: Python 3.10+, OpenAI API key (for examples).
Example execution: export OPENAI_API_KEY=<your api key> followed by python examples/single_env.py or python examples/multi_env.py.
Documentation: https://crab.camel-ai.org/

Highlighted Details

Cross-platform and multi-environment support with a unified agent interface.
Easy action definition via @action decorator.
Novel graph evaluator for fine-grained metrics.
Includes a benchmark suite (crab-benchmark-v0).

Maintenance & Community

The project is associated with CAMEL-AI. Further community and roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The project appears to be research-oriented, with specific examples requiring an OpenAI API key. The absence of a stated license may pose a barrier to commercial adoption.

Health Check

Last Commit

4 days ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days