Discover and explore top open-source AI tools and projects—updated daily.
RobbyantCausal video-action world model for robot control
New!
Top 49.0% on SourcePulse
LingBot-VA is a causal video-action world model designed for generalist robot control. It addresses the challenge of unifying visual dynamics prediction and action inference within a single framework, enabling robots to learn and execute complex tasks more efficiently and with greater generalization capabilities. The project targets researchers and engineers in robotics and AI, offering a novel approach to world modeling that leads to state-of-the-art performance in simulated and real-world robotic manipulation.
How It Works
LingBot-VA employs an autoregressive video-action world modeling approach. Its core innovation lies in architecturally unifying visual dynamics prediction and action inference within a single interleaved sequence, while preserving their distinct conceptual roles. This is achieved through a high-efficiency, dual-stream Mixture-of-Transformers (MoT) architecture incorporating Asynchronous Execution and KV Cache. This design allows for efficient processing and enables the model to achieve significant improvements in sample efficiency, long-horizon task success rates, and generalization to novel environments.
Quick Start & Requirements
pip install torch==2.9.0 torchvision==0.24.0 torchaudio==2.9.0 --index-url https://download.pytorch.org/whl/cu126
pip install websockets einops diffusers==0.36.0 transformers==4.55.2 accelerate msgpack opencv-python matplotlib ftfy easydict
pip install flash-attn --no-build-isolation
https://robotwin-platform.github.io/doc/usage/robotwin-install.html), modifying its requirements.txt (e.g., transforms3d==0.4.2, sapien==3.0.0b1, gymnasium==0.29.1, huggingface_hub==0.36.2), and running provided installation scripts.Highlighted Details
Maintenance & Community
Weights and code for the shared backbone were released on January 29, 2026. For questions, discussions, or collaborations, users can open an issue on GitHub or contact Dr. Qihang Zhang (liuhuan.zqh@antgroup.com) or Dr. Lin Li (fengchang.ll@antgroup.com).
Licensing & Compatibility
This project is released under the Apache License 2.0. This permissive license allows for commercial use and integration with closed-source projects without significant restrictions.
Limitations & Caveats
The provided documentation does not explicitly detail known limitations, alpha status, or specific bugs. The setup for evaluation, particularly with the RoboTwin environment, involves modifying existing scripts and dependencies, which may require careful configuration and troubleshooting.
2 days ago
Inactive
bytedance
microsoft
stepjam