GUI agent for context-aware action grounding from instructions
Top 77.1% on sourcepulse
Aria-UI is an open-source project providing fast, context-aware action grounding for GUI/computer-use agents. It translates natural language instructions into precise pixel coordinates on graphical user interfaces, enabling agents to interact with software. The project targets developers building autonomous agents for tasks like UI automation, testing, and assistive technologies.
How It Works
Aria-UI employs a mixture-of-expert (MoE) architecture with 3.9B activated parameters per token. It processes variable-sized GUI inputs, including interleaved text and images, to understand instructions contextually. This approach allows for efficient encoding of visual information and leverages historical context to improve grounding accuracy, leading to state-of-the-art performance on agent benchmarks.
Quick Start & Requirements
pip install transformers==4.45.0 accelerate==0.34.1 sentencepiece==0.2.0 torchvision requests torch Pillow
flash-attn
and optionally grouped_gemm
.pip install https://wheels.vllm.ai/${VLLM_COMMIT}/vllm-1.0.0.dev-cp38-abi3-manylinux1_x86_64.whl
(with VLLM_COMMIT
set).Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
5 months ago
1 day