Benchmark for robotics manipulation and embodied agents
Top 94.3% on SourcePulse
VLABench is a large-scale benchmark suite designed for evaluating Vision-Language Agents (VLAs) and Vision-Language Models (VLMs) in robotics manipulation tasks. It targets researchers and engineers working on embodied AI and language-conditioned robotics, providing a standardized framework for assessing long-horizon reasoning and generalization capabilities.
How It Works
VLABench utilizes a modular framework for task construction, allowing for high adaptability and expansion. It offers standardized benchmark datasets across various dimensions, including in-distribution performance, cross-category generalization, common-sense reasoning, semantic instruction following, cross-task transfer, and visual robustness to texture variations. The evaluation framework is designed to ensure fair comparisons across different models and machines.
Quick Start & Requirements
conda create -n vlabench python=3.10
), activate it, install requirements (pip install -r requirements.txt
), and install VLABench locally (pip install -e .
).python scripts/download_assets.py
.git submodule update --init --recursive
.Highlighted Details
Maintenance & Community
The project is actively maintained, with recent updates including parallel data collection, camera augmentation, and the release of finetuned checkpoints. The authors encourage community contributions via issues and pull requests and plan to release a comprehensive infra framework, including training pipelines and a leaderboard.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The preview version's functionalities are still being managed and tested. The current data collection scripts do not support multi-processing within the code, though parallelization is planned. The conversion to RLDS format is noted as time-consuming with a single process, and the original repo codes for this conversion may have bugs.
1 week ago
1 week