Research code for device-control agents via autonomous reinforcement learning
Top 77.3% on sourcepulse
DigiRL provides a framework for training device-control agents using autonomous reinforcement learning, specifically targeting in-the-wild Android environments. It offers solutions for researchers and developers looking to build robust agents capable of complex, real-world interactions, with a focus on offline and offline-to-online learning paradigms.
How It Works
The core of DigiRL lies in its two proposed training algorithms: DigiRL (combining automatic curriculum learning with doubly robust estimator filtering) and Filtered Behavior Cloning (employing reward-based filtering). These methods enable three training modes: offline-only, online-only, and offline-to-online, allowing for flexible agent development from pre-collected data to fully interactive learning. The framework supports AutoUI and CogAgent agents and is designed for tasks like general browsing and web shopping on Android.
Quick Start & Requirements
conda create -n digirl python==3.10
, conda activate digirl
), clone the repo, and run pip install -e .
..pt
files) and model checkpoints from provided Hugging Face links or Google Drive.scripts/config/main/default.yaml
and other relevant config files for specific experiments.Highlighted Details
accelerate
.Maintenance & Community
The project is associated with researchers from UC Berkeley, UIUC, and Google DeepMind. Contributions are welcomed via PRs or issues for new algorithms, base models, or task sets.
Licensing & Compatibility
All content, including codebase, data, and model checkpoints, is released under the Apache License v2.0. This license permits commercial use and linking with closed-source projects.
Limitations & Caveats
CogAgent evaluation requires a separate server setup with at least 48GB GPU memory. Multi-machine emulation requires additional setup detailed in a separate README. While multi-GPU DDP is supported, multi-machine DDP is not currently supported. Free-tier Gemini API usage may require adjusting timeouts to avoid errors.
5 months ago
1 week