MiraData  by mira-space

Video dataset for long video generation research

created 1 year ago
462 stars

Top 66.8% on sourcepulse

GitHubView on GitHub
Project Summary

MiraData is a large-scale video dataset designed to address limitations in existing datasets for long video generation, particularly concerning video duration and structured captions. It targets researchers and developers working on advanced video generation models, offering extended video clips and detailed, multi-faceted descriptions to improve temporal consistency and motion understanding.

How It Works

MiraData comprises video clips with an average duration of 72 seconds, significantly longer than typical datasets. Each clip is accompanied by structured captions generated using GPT-4V, providing detailed descriptions of main objects, background, style, camera movement, and overall content. This approach aims to offer richer semantic information for training and evaluating video generation models.

Quick Start & Requirements

  • Install dependencies: pip install torch==1.12.1+cu116 torchvision==0.13.1+cu116 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu116 and pip install -r requirements.txt.
  • Download meta files from Google Drive or HuggingFace Dataset.
  • Use python download_data.py to download video samples.
  • For evaluation using MiraBench: python calculate_score.py --meta_file ...
  • Requires CUDA 11.6 for PyTorch.

Highlighted Details

  • Dataset contains 330K, 93K, 42K, and 9K data versions.
  • Captions include six types: short, dense, background, main object, style, and camera.
  • MiraBench introduces 17 evaluation metrics across 6 perspectives for long video generation.
  • GPT-4V was used for captioning, with a prompt strategy detailed in caption_gpt4v.py.

Maintenance & Community

Licensing & Compatibility

  • MiraData is under GPL-v3 License.
  • Stated as supported for commercial usage, with a note to contact for a commercial license.
  • Copyright of videos remains with original owners; dataset is for informational purposes. Commercial use of the videos themselves is restricted.

Limitations & Caveats

The dataset is primarily for informational purposes, and the copyright of the videos belongs to their original owners. Users agree not to reproduce, duplicate, copy, sell, trade, resell, or exploit any portion of the videos or derived data for commercial purposes.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
2
Star History
26 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.