Dataset of compressed driving video for world models
Top 85.5% on sourcepulse
This repository provides commaVQ, a large-scale dataset of compressed driving videos and associated models for world modeling and video prediction. It targets researchers and engineers in autonomous driving and AI, offering a foundation for developing predictive world models by leveraging a novel VQ-VAE compression technique.
How It Works
The project utilizes a VQ-VAE to compress raw driving video frames into discrete 10-bit tokens. A Generative Pre-trained Transformer (GPT) world model is then trained on millions of minutes of these compressed video segments to predict future states. This approach allows for efficient representation and prediction of complex driving scenarios, enabling the development of more sample-efficient world models.
Quick Start & Requirements
datasets
: pip install datasets
from datasets import load_dataset; ds = load_dataset('commaai/commavq', num_proc=40)
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project's primary focus is on compressed video data and world modeling; it does not provide end-to-end self-driving solutions or pre-trained models for direct deployment in vehicles. The compression challenge has concluded.
1 month ago
1 week