crab  by camel-ai

Framework for LLM agent benchmark environments

created 1 year ago
361 stars

Top 78.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

CRAB (Cross-environment Agent Benchmark) is a Python-centric framework for building and evaluating multimodal language model agents across diverse environments. It targets researchers and developers building embodied AI agents, offering a unified interface for agents to interact with multiple environments simultaneously, thereby simplifying complex benchmarking scenarios.

How It Works

CRAB enables the creation of agent environments deployable across various platforms (in-memory, Docker, VMs, distributed machines) via Python functions. New agent actions are defined using a simple @action decorator, and environments are constructed by integrating these actions. The framework introduces a novel graph-based evaluation method for fine-grained performance metrics.

Quick Start & Requirements

  • Install: pip install crab-framework[client]
  • Prerequisites: Python 3.10+, OpenAI API key (for examples).
  • Example execution: export OPENAI_API_KEY=<your api key> followed by python examples/single_env.py or python examples/multi_env.py.
  • Documentation: https://crab.camel-ai.org/

Highlighted Details

  • Cross-platform and multi-environment support with a unified agent interface.
  • Easy action definition via @action decorator.
  • Novel graph evaluator for fine-grained metrics.
  • Includes a benchmark suite (crab-benchmark-v0).

Maintenance & Community

The project is associated with CAMEL-AI. Further community and roadmap details are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The project appears to be research-oriented, with specific examples requiring an OpenAI API key. The absence of a stated license may pose a barrier to commercial adoption.

Health Check
Last commit

4 weeks ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
25 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.