Cradle  by BAAI-Agents

Framework for general computer control via foundation agents

Created 2 years ago
2,489 stars

Top 18.2% on SourcePulse

GitHubView on GitHub
Project Summary

Cradle is a framework designed for General Computer Control (GCC), enabling foundation models to perform complex tasks across various software and games using a human-like interface of screenshots and keyboard/mouse inputs. It targets researchers and developers aiming to build autonomous agents capable of interacting with digital environments.

How It Works

Cradle operates by abstracting computer interactions into a unified environment. It processes screenshots as input, leverages Large Language Models (LLMs) for reasoning and planning, and outputs keyboard and mouse commands. The framework supports a modular design, allowing for the integration of custom skills and environment-specific logic, facilitating adaptation to new applications.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment (conda create --name cradle-dev python=3.10), activate it (conda activate cradle-dev), and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.10, OpenAI or Anthropic Claude API keys (configured in a .env file), and spaCy language models (en_core_web_lg).
  • Setup: Requires API key configuration and dependency installation. Detailed setup for specific games/software is provided.
  • Documentation: Website, arXiv

Highlighted Details

  • Supports interaction with games like RDR2, Stardew Valley, Cities: Skylines, and Dealer's Life 2, as well as software including Chrome, Outlook, and Capcut.
  • Employs a modular structure with distinct directories for environment configurations, game/software resources, and core framework modules.
  • Features a "Migrate to New Game Guide" for adapting the framework to novel applications.
  • Includes an icon_replacer.py for improving icon recognition by LLMs.

Maintenance & Community

The project is actively maintained by BAAI-Agents. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license generally permits commercial use and linking with closed-source projects.

Limitations & Caveats

The framework's effectiveness is dependent on the LLM's capabilities and the quality of the environment-specific configurations and skills. Adapting to new games or software requires careful implementation following provided guidelines.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
3
Issues (30d)
0
Star History
26 stars in the last 30 days

Explore Similar Projects

Starred by Edward Z. Yang Edward Z. Yang(Research Engineer at Meta; Maintainer of PyTorch), Anton Osika Anton Osika(Cofounder of Lovable), and
3 more.

gptme by gptme

0.3%
4k
CLI tool for terminal agent workflows
Created 3 years ago
Updated 1 day ago
Feedback? Help us improve.