RynnVLA-002 by alibaba-damo-academy

Autoregressive action world model for robotics

Created 6 months ago

831 stars

Top 42.8% on SourcePulse

Project Summary

WorldVLA is an autoregressive action world model that unifies vision, language, and action understanding and generation for robotics. It targets researchers and developers in embodied AI and robotics, enabling tasks like generating robot actions from text and images, and predicting future states from actions.

How It Works

WorldVLA integrates a Vision-Language-Action (VLA) model for action generation and a world model for next-frame prediction within a single framework. It leverages the autoregressive capabilities of large language models, adapted for multimodal inputs (images and actions), to predict sequences of actions or future visual states. This unified approach aims to improve the coherence and efficiency of robot control and simulation.

Quick Start & Requirements

Installation: Requires creating a Conda environment (conda env create -f environment.yml), cloning the LIBERO repository (git clone https://github.com/Lifelong-Robot-Learning/LIBERO.git), and installing it (pip install -e .).
Prerequisites: LIBERO dataset, Chameleon model weights (tokenizer and starting point).
Resources: Training involves significant data preparation and computational resources.
Links: Hugging Face Checkpoints, arXiv Paper.

Highlighted Details

Achieves high success rates on the LIBERO benchmark for action generation (e.g., 96.2% for LIBERO-Object at 512x512 resolution).
Supports both 256x256 and 512x512 image resolutions.
Provides detailed scripts for data preparation, training, and evaluation.
Built upon foundational models like Lumina-mGPT, Chameleon, and OpenVLA.

Maintenance & Community

The project was released on June 23, 2025, with code for the action model on the LIBERO benchmark. Future releases are planned for the world model and real-world experiments.

RynnVLA-002 by alibaba-damo-academy

Explore Similar Projects

Motus by thu-ml

Hybrid-VLA by PKU-HMI-Lab

Large-VLM-based-VLA-for-Robotic-Manipulation by JiuTian-VL

vla0 by NVlabs

Instruct2Act by OpenGVLab

molmoact by allenai

CogACT by microsoft

OpenDriveVLA by DriveVLA

UniVLA by OpenDriveLab

SimpleVLA-RL by PRIME-RL

Awesome-Robotics-Foundation-Models by robotics-survey

Isaac-GR00T by NVIDIA