Android-Lab by THUDM

Android autonomous agent training and benchmarking framework

Created 1 year ago

301 stars

Top 88.8% on SourcePulse

Project Summary

Summary

AndroidLab provides a systematic framework for training and benchmarking autonomous agents on Android devices. It addresses the need for reproducible evaluation of AI agents in complex mobile environments. The project offers a comprehensive benchmark suite and an operation environment, benefiting researchers and developers aiming to build and assess sophisticated Android agents.

How It Works

The framework comprises an operation environment and a reproducible benchmark featuring 138 tasks across nine distinct Android applications. These apps, including Bluecoins, Calendar, and Maps.me, are selected for their offline functionality to ensure consistent and reliable testing conditions. AndroidLab supports two execution modes: AVD on Mac (arm64) and Docker on Linux (x86_64). This approach allows for systematic evaluation and training, enabling open-source models to achieve performance levels comparable to proprietary agents through instruction tuning on the provided Android Instruct dataset.

Quick Start & Requirements

Installation involves cloning the repository, creating a Python 3.11 Conda environment, and installing dependencies via pip install -r requirements.txt. Users must set up either AVD on Mac (arm64) or Docker on Linux (x86_64) following guides linked within the README. Each concurrent session requires approximately 6GB of memory and 9GB of storage.

Highlighted Details

Features a benchmark of 138 tasks across nine offline-functional Android apps.
Enables training of open-source LLMs and LMMs to achieve performance comparable to proprietary models.
Supports two distinct execution environments: AVD on Mac (arm64) and Docker on Linux (x86_64).
Includes an open-sourced Android Instruct dataset and a complete evaluation framework.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord, Slack), or roadmap were found in the provided README excerpt.

Licensing & Compatibility

The license type and any compatibility notes for commercial use or closed-source linking are not specified in the provided README excerpt.

Limitations & Caveats

The framework's primary execution environments are limited to AVD on Mac (arm64) and Docker on Linux (x86_64). Evaluation processes require API keys for specific judge models like GPT-4o or GLM4.

Android-Lab by THUDM

Explore Similar Projects

awesome-autonomous-gpt by ScarletPan

SWE-bench_Pro-os by scaleapi

deliteAI by NimbleEdge

appworld by StonyBrookNLP

WindowsAgentArena by microsoft

AgentGym by WooooDyy

acu by trycua

android_world by google-research

cookbook by Liquid4All

mobile-use by minitap-ai

MobiAgent by IPADS-SAI

OSWorld by xlang-ai