Discover and explore top open-source AI tools and projects—updated daily.
google-deepmindBenchmark for generative video models' physical reasoning
Top 97.1% on SourcePulse
This benchmark addresses the critical need for evaluating the physical understanding capabilities of generative video models. It offers a high-quality, realistic dataset and a standardized evaluation framework, enabling researchers and developers to rigorously assess and compare their models' grasp of real-world physics, thereby advancing the field of AI video generation.
How It Works
The Physics-IQ benchmark operates in two phases: video generation and evaluation. Users generate videos using their models, adhering to specific input requirements for Image-to-Video (I2V) or Multiframe-to-Video (V2V) architectures. These generated videos, crucially trimmed to exactly five seconds, are then processed by an evaluation script that compares them against ground truth data, producing a Physics-IQ score. The benchmark's novelty lies in its use of real-world, high-resolution videos covering diverse physical phenomena, filmed from multiple perspectives, offering a more robust assessment than synthetic datasets.
Quick Start & Requirements
gsutil (pip install gsutil) and project dependencies (pip install -r requirements.txt). Ensure Python 3 is installed.python3 ./code/download_physics_iq_data.py). The dataset requires a specific directory structure (full-videos/, split-videos/, switch-frames/, video-masks/).ffmpeg (e.g., ffmpeg -y -i "$v" -t 5 -r 24 "generated_videos_5s/$(basename "$v")"). Videos of other durations are incompatible.python3 code/run_physics_iq.py --input_folders <generated_videos_dirs> --output_folder <output_dir> --descriptions_file <descriptions_file>.Highlighted Details
Maintenance & Community
No specific details regarding community channels, active contributors, or roadmap are provided in the README.
Licensing & Compatibility
Limitations & Caveats
Users who accessed the dataset or toolbox before February 19, 2025, are strongly advised to re-download the dataset and re-run evaluations due to improvements and changes in the benchmark. A critical requirement is that all generated videos must be precisely trimmed to five seconds; otherwise, they will be incompatible with the evaluation script.
1 month ago
Inactive