Discover and explore top open-source AI tools and projects—updated daily.
THUDMVisual foundation agents benchmark for LMMs
Top 98.5% on SourcePulse
Summary
VisualAgentBench (VAB) addresses the need for systematic evaluation and development of Large Multimodal Models (LMMs) as visual foundation agents. It provides a comprehensive benchmark suite covering Embodied, GUI, and Visual Design tasks across five distinct environments. VAB enables researchers and practitioners to assess LMM capabilities in visually-grounded interactive scenarios and facilitates the development of more potent visual agents through its unique trajectory training dataset.
How It Works
VAB builds upon the AgentBench framework, employing an Agent-Controller, Task-Controller, and Assigner architecture for efficient, parallelized agent evaluation. Its core innovation lies in offering a trajectory training set specifically designed for behavior cloning (BC). This allows open Large Language Models (LLMs) and LMMs to be trained on agent task trajectories, enhancing their ability to follow complex instructions and perform visual tasks, a capability often lacking in base models.
Quick Start & Requirements
Setup involves cloning the repository, creating and activating a Conda environment (python=3.9), and installing dependencies (pip install -r requirements.txt). Docker is a prerequisite. Users must configure their OpenAI API Key in configs/agents/openai-chat.yaml. To run tasks, first start the task server (python -m src.start_task -a), which typically takes about a minute to launch four workers. Subsequently, initiate the evaluation via the assigner (python -m src.assigner --auto-retry). Specific environments like VAB-WebArena-Lite may have additional setup instructions.
Highlighted Details
Maintenance & Community
No specific community channels (e.g., Discord, Slack) or maintenance indicators (e.g., sponsorships, active development signals) are detailed in the provided README snippet.
Licensing & Compatibility
Licensing information is not specified in the provided README content.
Limitations & Caveats
The VAB-Mobile environment is currently marked as "Ongoing." VAB-WebArena-Lite requires a separate installation and evaluation procedure. Open LMMs generally struggle with complex agent task instructions without prior finetuning on the VAB training dataset.
10 months ago
Inactive
hkust-nlp
allenai
THUDM
microsoft
xlang-ai