digirl  by DigiRL-agent

Research code for device-control agents via autonomous reinforcement learning

Created 1 year ago
378 stars

Top 75.2% on SourcePulse

GitHubView on GitHub
Project Summary

DigiRL provides a framework for training device-control agents using autonomous reinforcement learning, specifically targeting in-the-wild Android environments. It offers solutions for researchers and developers looking to build robust agents capable of complex, real-world interactions, with a focus on offline and offline-to-online learning paradigms.

How It Works

The core of DigiRL lies in its two proposed training algorithms: DigiRL (combining automatic curriculum learning with doubly robust estimator filtering) and Filtered Behavior Cloning (employing reward-based filtering). These methods enable three training modes: offline-only, online-only, and offline-to-online, allowing for flexible agent development from pre-collected data to fully interactive learning. The framework supports AutoUI and CogAgent agents and is designed for tasks like general browsing and web shopping on Android.

Quick Start & Requirements

  • Install: Create a conda environment (conda create -n digirl python==3.10, conda activate digirl), clone the repo, and run pip install -e ..
  • Prerequisites: Python 3.10, a Hugging Face token, WandB token, Gemini token, and potentially a GPU with 12GB+ VRAM for AutoUI. Setting up the Android environment requires following a separate README.
  • Data: Download pre-collected trajectories (.pt files) and model checkpoints from provided Hugging Face links or Google Drive.
  • Configuration: Modify scripts/config/main/default.yaml and other relevant config files for specific experiments.
  • Links: Website, Demo, Results, Paper, Checkpoints, Data.

Highlighted Details

  • Supports AutoUI and CogAgent agents.
  • Offers three training modes: Offline-only, Online-only, and Offline-to-online.
  • Includes two Android-in-the-Wild task sets: General and Web Shopping.
  • Features auto-adaptive error handling, multi-machine emulation, and checkpoint resuming.
  • Supports multi-GPU training via accelerate.

Maintenance & Community

The project is associated with researchers from UC Berkeley, UIUC, and Google DeepMind. Contributions are welcomed via PRs or issues for new algorithms, base models, or task sets.

Licensing & Compatibility

All content, including codebase, data, and model checkpoints, is released under the Apache License v2.0. This license permits commercial use and linking with closed-source projects.

Limitations & Caveats

CogAgent evaluation requires a separate server setup with at least 48GB GPU memory. Multi-machine emulation requires additional setup detailed in a separate README. While multi-GPU DDP is supported, multi-machine DDP is not currently supported. Free-tier Gemini API usage may require adjusting timeouts to avoid errors.

Health Check
Last Commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Hanlin Tang Hanlin Tang(CTO Neural Networks at Databricks; Cofounder of MosaicML), Amanpreet Singh Amanpreet Singh(Cofounder of Contextual AI), and
2 more.

coach by IntelLabs

0%
2k
Reinforcement learning framework for experimentation (discontinued)
Created 8 years ago
Updated 2 years ago
Feedback? Help us improve.