dreamzero  by dreamzero0

World action models for zero-shot task generalization

Created 4 weeks ago

New!

847 stars

Top 41.9% on SourcePulse

GitHubView on GitHub
Project Summary

DreamZero provides code for a World Action Model that excels at zero-shot task generalization by jointly predicting actions and videos. It targets researchers and engineers in robotics and embodied AI, offering a pre-trained model, a distributed inference server, and tools for simulation and real-world evaluations, significantly accelerating development and experimentation.

How It Works

The core of DreamZero is its ability to predict future actions and corresponding video frames simultaneously. This release focuses on the DROID model, utilizing Diffusion Transformer (DiT) caching for highly optimized inference. This approach allows for robust zero-shot performance on novel tasks without explicit task-specific training, integrating perception, action, and generation.

Quick Start & Requirements

  • Python: 3.11
  • Hardware: Multi-GPU setup (tested on GB200, H100), minimum 2 GPUs for distributed inference.
  • CUDA: 12.9+
  • Installation: Conda environment setup (conda create -n dreamzero python=3.11, conda activate dreamzero), followed by pip install -e . --extra-index-url https://download.pytorch.org/whl/cu129. Additional dependencies like flash-attn and Transformer Engine may be required based on hardware.
  • Pretrained Checkpoint: Download from Huggingface: GEAR-Dreams/DreamZero-DROID.
  • Inference Server: Launch via torch.distributed.run with specified model path and port.
  • Simulation Evaluation: Requires cloning the sim-evals repository (URL not provided) and running eval_utils/run_sim_eval.py after obtaining API access.
  • Links: Project Page, Paper.

Highlighted Details

  • Optimized distributed inference server with DiT caching achieves ~0.6s on GB200 and ~3s on H100.
  • Supports DROID simulation evaluation and RoboArena (DROID real-world) integration.
  • Includes LoRA training scripts for DreamZero on the DROID dataset and full fine-tuning capabilities.
  • Enables video generation and saving in MP4 format.

Maintenance & Community

No specific community channels (e.g., Discord, Slack) or detailed maintenance signals (e.g., active contributors, sponsorships) are explicitly mentioned in the provided README.

Licensing & Compatibility

Licensed under the Apache License 2.0. This license is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

Future enhancements include support for new embodiments (e.g., YAM), PolaRiS, and Genie 3.0 simulation environments. The simulation evaluation process requires a separate setup and API access request.

Health Check
Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
11
Star History
854 stars in the last 29 days

Explore Similar Projects

Feedback? Help us improve.