LIBERO-plus by sylvestf

Benchmark for analyzing Vision-Language-Action model robustness

Created 9 months ago

376 stars

Top 75.3% on SourcePulse

Project Summary

This repository provides LIBERO-plus, a generalized benchmark designed for in-depth robustness analysis of Vision-Language-Action (VLA) models. It systematically exposes vulnerabilities in contemporary VLA models through comprehensive evaluations across seven perturbation dimensions, enabling researchers and engineers to rigorously assess and improve model resilience. The primary benefit is a standardized, detailed methodology for understanding VLA model weaknesses in realistic, varied conditions.

How It Works

LIBERO-plus introduces a benchmark suite comprising 10,030 tasks, designed to evaluate VLA models against seven distinct perturbation categories: Objects Layout, Camera Viewpoints, Robot Initial States, Language Instructions, Light Conditions, Background Textures, and Sensor Noise. This approach allows for the identification of specific failure modes, such as extreme sensitivity to environmental changes or a lack of genuine language understanding, offering a more profound insight into model robustness than standard benchmarks.

Quick Start & Requirements

Primary install / run command: Clone the repository, uninstall any existing LIBERO installation, navigate to the cloned directory, and run pip install -e ..
Non-default prerequisites and dependencies: Requires apt install libexpat1, apt install libfontconfig1-dev, apt install libpython3-stdlib, apt-get install libmagickwand-dev, and pip install -r extra_requirements.txt. Additionally, users must download and unzip the project's assets to the /LIBERO-plus/libero/libero/assets/ directory.
Links: The README mentions categories for a paper (arXiv:2510.13626), assets, website, model, RLDS dataset, and LEROBOT dataset, but does not provide direct URLs for these resources.

Highlighted Details

Key Findings: VLA models demonstrate significant fragility, with performance dropping drastically (from 95% to below 30%) under modest camera viewpoint and robot initial state perturbations. Models often exhibit "Language Ignorance," functioning more as Vision-Action systems.
Benchmark Scope: Evaluates 10,030 tasks across 7 perturbation dimensions, including object layout, camera viewpoints, robot states, language, lighting, background, and sensor noise.
Evaluated Models: Includes OpenVLA and its variants (OFT, OFT_w, OFT_m), π₀, π₀-fast, Nora, WorldVLA, UniVLA, and RIPT-VLA.
Performance: The OpenVLA-OFT+ model, fine-tuned on LIBERO-plus, achieves an overall score of 79.6% on the benchmark.

Maintenance & Community

The project aims to be community-driven, encouraging users to submit pull requests to add their research works that adopt LIBERO-Plus. Specific community channels like Discord or Slack are not detailed in the README.

Licensing & Compatibility

The provided README does not explicitly state the software license for the LIBERO-plus repository. This absence of clear licensing information may pose compatibility concerns for commercial use or integration into closed-source projects.

Limitations & Caveats

The benchmark highlights inherent limitations in current VLA models, such as extreme sensitivity to environmental variations and a tendency to disregard language instructions. The setup process requires careful uninstallation of prior LIBERO versions and manual asset management, which could be a minor adoption hurdle. The lack of explicit licensing is a significant caveat for potential adopters.

LIBERO-plus by sylvestf

Explore Similar Projects

Video-MME-v2 by MME-Benchmarks

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

llm_rules by normster

Robust-R1 by jqtangust

klarity by klara-research

Robo3D by worldbench

circuit-breakers by GraySwanAI

Seed1.5-VL by ByteDance-Seed

OpenCUA by xlang-ai

robustbench by RobustBench

SAELens by decoderesearch

captum by meta-pytorch