Android-Lab  by THUDM

Android autonomous agent training and benchmarking framework

Created 1 year ago
251 stars

Top 99.9% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

AndroidLab provides a systematic framework for training and benchmarking autonomous agents on Android devices. It addresses the need for reproducible evaluation of AI agents in complex mobile environments. The project offers a comprehensive benchmark suite and an operation environment, benefiting researchers and developers aiming to build and assess sophisticated Android agents.

How It Works

The framework comprises an operation environment and a reproducible benchmark featuring 138 tasks across nine distinct Android applications. These apps, including Bluecoins, Calendar, and Maps.me, are selected for their offline functionality to ensure consistent and reliable testing conditions. AndroidLab supports two execution modes: AVD on Mac (arm64) and Docker on Linux (x86_64). This approach allows for systematic evaluation and training, enabling open-source models to achieve performance levels comparable to proprietary agents through instruction tuning on the provided Android Instruct dataset.

Quick Start & Requirements

Installation involves cloning the repository, creating a Python 3.11 Conda environment, and installing dependencies via pip install -r requirements.txt. Users must set up either AVD on Mac (arm64) or Docker on Linux (x86_64) following guides linked within the README. Each concurrent session requires approximately 6GB of memory and 9GB of storage.

Highlighted Details

  • Features a benchmark of 138 tasks across nine offline-functional Android apps.
  • Enables training of open-source LLMs and LMMs to achieve performance comparable to proprietary models.
  • Supports two distinct execution environments: AVD on Mac (arm64) and Docker on Linux (x86_64).
  • Includes an open-sourced Android Instruct dataset and a complete evaluation framework.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmap were found in the provided README excerpt.

Licensing & Compatibility

The license type and any compatibility notes for commercial use or closed-source linking are not specified in the provided README excerpt.

Limitations & Caveats

The framework's primary execution environments are limited to AVD on Mac (arm64) and Docker on Linux (x86_64). Evaluation processes require API keys for specific judge models like GPT-4o or GLM4.

Health Check
Last Commit

3 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.