VLABench by OpenMOSS

Benchmark for robotics manipulation and embodied agents

Created 1 year ago

363 stars

Top 77.4% on SourcePulse

Project Summary

VLABench is a large-scale benchmark suite designed for evaluating Vision-Language Agents (VLAs) and Vision-Language Models (VLMs) in robotics manipulation tasks. It targets researchers and engineers working on embodied AI and language-conditioned robotics, providing a standardized framework for assessing long-horizon reasoning and generalization capabilities.

How It Works

VLABench utilizes a modular framework for task construction, allowing for high adaptability and expansion. It offers standardized benchmark datasets across various dimensions, including in-distribution performance, cross-category generalization, common-sense reasoning, semantic instruction following, cross-task transfer, and visual robustness to texture variations. The evaluation framework is designed to ensure fair comparisons across different models and machines.

Quick Start & Requirements

Installation: Clone the repository, create a conda environment (conda create -n vlabench python=3.10), activate it, install requirements (pip install -r requirements.txt), and install VLABench locally (pip install -e .).
Assets: Download necessary assets using python scripts/download_assets.py.
Submodules: Initialize submodules with git submodule update --init --recursive.
Prerequisites: Python 3.10, Conda.
Resources: Data collection can be parallelized. Evaluation of each task can take 30 minutes to 1 hour per process.
Links: Paper, Project Website, Hugging Face.

Highlighted Details

Supports multiple evaluation tracks focusing on different generalization capabilities.
Provides standardized fine-tuning datasets for primitive tasks.
Offers scripts for converting data to RLDS and Libero formats.
Includes evaluation pipelines for popular VLA models like OpenVLA and Open-Pi, and VLMs such as GPT-4v, Qwen2-VL, and Llava.
Supports multi-GPU accelerated evaluation for faster benchmarking.

Maintenance & Community

The project is actively maintained, with recent updates including parallel data collection, camera augmentation, and the release of finetuned checkpoints. The authors encourage community contributions via issues and pull requests and plan to release a comprehensive infra framework, including training pipelines and a leaderboard.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The preview version's functionalities are still being managed and tested. The current data collection scripts do not support multi-processing within the code, though parallelization is planned. The conversion to RLDS format is noted as time-consuming with a single process, and the original repo codes for this conversion may have bugs.

VLABench by OpenMOSS

Explore Similar Projects

OpenThinkIMG by zhaochen0110

Hybrid-VLA by PKU-HMI-Lab

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

RoboVLMs by Robot-VLAs

vla0 by NVlabs

unified_video_action by ShuangLI59

RoboFlamingo by RoboFlamingo

BakLLaVA by SkunkworksAI

UniVLA by OpenDriveLab

cliport by cliport

lmms-eval by EvolvingLMMs-Lab

openpi by Physical-Intelligence