boundless-world-model  by boundless-large-model

Physically consistent world models for embodied intelligence and robotic simulation

Created 2 months ago
1,829 stars

Top 23.2% on SourcePulse

GitHubView on GitHub
Project Summary

Boundless-World-Model (BWM) addresses the need for high-fidelity, physically consistent simulators for embodied intelligence research, particularly in robotic manipulation. It provides an action-conditioned video world model built upon Wan2.2-TI2V-5B, enabling low-cost simulation for researchers and developers. The project aims to accelerate the development of general embodied AI by offering a robust simulation environment.

How It Works

BWM is an action-conditioned video world model leveraging the Wan2.2-TI2V-5B architecture. It generates physically consistent, high-fidelity video sequences based on initial frames and action inputs, simulating complex robotic interactions. This approach allows for autoregressive rollouts over long horizons, maintaining visual realism and physical plausibility, which is advantageous for training and evaluating embodied agents without costly real-world experiments.

Quick Start & Requirements

  • Installation: Setup involves creating and activating a Conda environment (conda create -n BWM python=3.10.20, conda activate BWM), installing specific PyTorch versions (torch==2.8.0, torchvision==0.23.0, torchaudio==2.8.0) with CUDA 12.8 support, diffsynth==2.0.11, and other dependencies (pip install -r requirements.txt).
  • Prerequisites: Python 3.10.20, CUDA >= 12.8, PyTorch 2.8.0.
  • Model Weights: Download base model from ModelScope (Wan-AI/Wan2.2-TI2V-5B) and BWM checkpoint from Hugging Face (BLM-Lab/Boundless-World-Model).
  • Inference: Copy example scripts, update MODEL_PATHS and CKPT_PATH in scripts/local.sh, then execute bash scripts/infer_example.sh.
  • Links: No direct quick-start or documentation links are provided.

Highlighted Details

  • Achieves high-fidelity visual realism and long-horizon physical consistency in complex robotic manipulation tasks, demonstrated in CVPR 2026 WorldArena Challenge simulations.
  • Successfully simulates diverse scenarios including compositional spatial rearrangement, articulated hinge interaction, fine-grained affordance interaction, bimanual coordination, and long-horizon constrained placement.
  • Exhibits out-of-distribution generalization capabilities, adapting to novel initial scenes generated by GPT-Image-2 and object appearance shifts while preserving action-conditioned dynamics.

Maintenance & Community

  • Key contributors include Wentao Tan, Zengrong Lin, and Yang Sun.
  • No community channels (e.g., Discord, Slack) or roadmap links are provided in the README.

Licensing & Compatibility

  • The README does not specify a license type or compatibility notes for commercial use.

Limitations & Caveats

  • The release is incomplete; model weights, training code, and a technical report are still marked as TODO.
  • Details regarding the framework architecture are pending ("Coming soon!").
Health Check
Last Commit

19 hours ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
1,845 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.