Video dataset for long video generation research
Top 66.8% on sourcepulse
MiraData is a large-scale video dataset designed to address limitations in existing datasets for long video generation, particularly concerning video duration and structured captions. It targets researchers and developers working on advanced video generation models, offering extended video clips and detailed, multi-faceted descriptions to improve temporal consistency and motion understanding.
How It Works
MiraData comprises video clips with an average duration of 72 seconds, significantly longer than typical datasets. Each clip is accompanied by structured captions generated using GPT-4V, providing detailed descriptions of main objects, background, style, camera movement, and overall content. This approach aims to offer richer semantic information for training and evaluating video generation models.
Quick Start & Requirements
pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116
and pip install -r requirements.txt
.python download_data.py
to download video samples.python calculate_score.py --meta_file ...
Highlighted Details
caption_gpt4v.py
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The dataset is primarily for informational purposes, and the copyright of the videos belongs to their original owners. Users agree not to reproduce, duplicate, copy, sell, trade, resell, or exploit any portion of the videos or derived data for commercial purposes.
11 months ago
1 day