VideoWorld  by ByteDance-Seed

Generative model for knowledge learning from unlabeled videos (CVPR 2025 paper)

created 6 months ago
606 stars

Top 54.8% on sourcepulse

GitHubView on GitHub
Project Summary

VideoWorld is a generative model that learns complex knowledge and skills purely from unlabeled video data, targeting researchers in computer vision and AI. It aims to demonstrate that visual observation alone is sufficient for learning tasks, rivaling traditional reinforcement learning approaches without explicit search or reward mechanisms.

How It Works

VideoWorld employs a latent dynamics model (LDM) to compress multi-frame visual changes into compact latent codes. An autoregressive transformer then processes these codes, enabling the model to predict future states and learn sequential dependencies. This approach enhances knowledge acquisition efficiency and effectiveness by focusing on salient visual transitions.

Quick Start & Requirements

  • Install: Clone repo, cd VideoWorld, bash install.sh.
  • Prerequisites: Python 3.10, PyTorch 2.1.0 with CUDA 12.1.
  • Dependencies: Includes KataGo for Go battles and CALVIN for robotics. Automated installation scripts are provided.
  • Resources: Requires downloading pre-trained weights for LDM initialization and Go battle inference.
  • Links: Project Page, Paper, Weights.

Highlighted Details

  • Achieves a 5-dan professional level in Go using a 300M parameter model.
  • Generalizes across robotic control tasks (CALVIN, RLBench), approaching oracle model performance.
  • Introduces Video-GoBench, a large-scale video-based Go dataset.
  • Explores knowledge learning from visual data, a novel direction compared to LLMs.

Maintenance & Community

  • Project accepted to CVPR 2025.
  • Code, dataset, and models are open-sourced.
  • Primary contributors and affiliations are listed in the paper.

Licensing & Compatibility

  • The repository does not explicitly state a license. The presence of pre-trained weights on Hugging Face may have separate terms.

Limitations & Caveats

  • The provided installation scripts may require manual intervention for specific dependencies like KataGo.
  • The transformers library might require a patch for inference due to bos_token_id issues.
  • The license status for commercial use or closed-source linking is unclear.
Health Check
Last commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
58 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.