Discover and explore top open-source AI tools and projects—updated daily.
AlmondGodMinimal world model for generating interactive video
Top 37.0% on SourcePulse
<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> TinyWorlds offers a minimal, educational implementation of DeepMind's Genie world model architecture. It addresses the challenge of scaling world models using action-less internet video by inferring actions between frames. Designed for engineers and researchers, it provides a clear, understandable codebase to explore the autoregressive, unsupervised methods likely used by DeepMind, enabling deeper insights into creating scalable world models.
How It Works
The project employs an autoregressive transformer over discrete tokens, significantly simplifying the prediction task. Core components include a Video Tokenizer (an FSQ VAE) that compresses video frames into a small set of discrete tokens, and an Action Tokenizer that infers action tokens between frames without explicit labels. A Dynamics Model, inspired by MaskGIT and BERT, then predicts future frame tokens conditioned on past video and inferred action tokens. This approach allows for scalable world model training from unlabeled video data by learning the underlying dynamics and actions.
Quick Start & Requirements
Installation involves cloning the repository and installing requirements: pip install -r requirements.txt. A WANDB_API_KEY is required. Datasets, such as zelda_frames.h5 or sonic_frames.h5, must be downloaded from Huggingface using provided scripts. Training is initiated via python scripts/full_train.py, and inference can be run after pulling pre-trained checkpoints. The project supports acceleration through Torch compile, Distributed Data Parallel (DDP), Automatic Mixed Precision (AMP), and TF32 training.
Highlighted Details
Maintenance & Community
The project appears open for contributions, with a "Next Steps" section detailing numerous planned enhancements and areas for improvement. No specific community channels (e.g., Discord, Slack) or formal maintenance structures are detailed in the provided README.
Licensing & Compatibility
The provided README does not specify a software license. Users should verify licensing terms before adoption, especially for commercial use.
Limitations & Caveats
Described as a "minimal implementation," TinyWorlds is intended for understanding and extension rather than immediate production deployment. Key features like Mixture of Experts, advanced positional embeddings, and distributed training (FSDP) are listed as future work, indicating the project is in an active development phase.
1 month ago
Inactive
SkyworkAI