Discover and explore top open-source AI tools and projects—updated daily.
amap-cvlabPhysically consistent world model for robotic manipulation
Top 86.6% on SourcePulse
Summary
ABot-PhysWorld addresses the challenge of generating physically consistent and action-controllable videos for robotic manipulation. Built upon a 14B-parameter Diffusion Transformer, it targets researchers and engineers in embodied AI and robotics, offering a solution for realistic, zero-shot robot-object interaction simulation and control. Its key benefit is enabling more robust and generalizable robotic systems through physically grounded world modeling.
How It Works
ABot-PhysWorld employs a 14B-parameter Diffusion Transformer (DiT) enhanced with physics-aware training and precise action control mechanisms. It integrates a decoupled VLM-based discriminator (Qwen3-VL, Gemini 3 Pro) for physics-aware DPO, generating task-specific physics checklists to enforce plausibility. Action conditioning is achieved via Parallel Context Blocks that residually inject spatial action maps into the DiT, preserving physical priors while enabling cross-embodiment control. The project also introduces EZS-Bench, a novel zero-shot evaluation benchmark for physically consistent video generation in robotics.
Quick Start & Requirements
Installation involves creating a conda environment (python=3.10), installing PyTorch with CUDA support (cu121), and then installing project dependencies (pip install -r requirements.txt). Recommended hardware includes GPUs with >= 60GB VRAM for optimal performance, though >= 24GB is supported with tiled VAE. The large VLM judge model (~150 GB) is auto-downloaded on first run. Inference can be initiated using `python inference.py
4 days ago
Inactive
bytedance
octo-models