Cradle  by BAAI-Agents

Framework for general computer control via foundation agents

created 1 year ago
2,223 stars

Top 20.8% on sourcepulse

GitHubView on GitHub
Project Summary

Cradle is a framework designed for General Computer Control (GCC), enabling foundation models to perform complex tasks across various software and games using a human-like interface of screenshots and keyboard/mouse inputs. It targets researchers and developers aiming to build autonomous agents capable of interacting with digital environments.

How It Works

Cradle operates by abstracting computer interactions into a unified environment. It processes screenshots as input, leverages Large Language Models (LLMs) for reasoning and planning, and outputs keyboard and mouse commands. The framework supports a modular design, allowing for the integration of custom skills and environment-specific logic, facilitating adaptation to new applications.

Quick Start & Requirements

  • Installation: Clone the repository, create a conda environment (conda create --name cradle-dev python=3.10), activate it (conda activate cradle-dev), and install dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.10, OpenAI or Anthropic Claude API keys (configured in a .env file), and spaCy language models (en_core_web_lg).
  • Setup: Requires API key configuration and dependency installation. Detailed setup for specific games/software is provided.
  • Documentation: Website, arXiv

Highlighted Details

  • Supports interaction with games like RDR2, Stardew Valley, Cities: Skylines, and Dealer's Life 2, as well as software including Chrome, Outlook, and Capcut.
  • Employs a modular structure with distinct directories for environment configurations, game/software resources, and core framework modules.
  • Features a "Migrate to New Game Guide" for adapting the framework to novel applications.
  • Includes an icon_replacer.py for improving icon recognition by LLMs.

Maintenance & Community

The project is actively maintained by BAAI-Agents. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: The MIT license generally permits commercial use and linking with closed-source projects.

Limitations & Caveats

The framework's effectiveness is dependent on the LLM's capabilities and the quality of the environment-specific configurations and skills. Adapting to new games or software requires careful implementation following provided guidelines.

Health Check
Last commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
159 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Pietro Schirano Pietro Schirano(Founder of MagicPath), and
1 more.

SillyTavern by SillyTavern

3.2%
17k
LLM frontend for power users
created 2 years ago
updated 3 days ago
Feedback? Help us improve.