physics-IQ-benchmark by google-deepmind

Benchmark for generative video models' physical reasoning

Created 1 year ago

302 stars

Top 88.3% on SourcePulse

View on GitHub

1 Expert Loves This Project

Han Wang

Cofounder of Mintlify

Project Summary

This benchmark addresses the critical need for evaluating the physical understanding capabilities of generative video models. It offers a high-quality, realistic dataset and a standardized evaluation framework, enabling researchers and developers to rigorously assess and compare their models' grasp of real-world physics, thereby advancing the field of AI video generation.

How It Works

The Physics-IQ benchmark operates in two phases: video generation and evaluation. Users generate videos using their models, adhering to specific input requirements for Image-to-Video (I2V) or Multiframe-to-Video (V2V) architectures. These generated videos, crucially trimmed to exactly five seconds, are then processed by an evaluation script that compares them against ground truth data, producing a Physics-IQ score. The benchmark's novelty lies in its use of real-world, high-resolution videos covering diverse physical phenomena, filmed from multiple perspectives, offering a more robust assessment than synthetic datasets.

Quick Start & Requirements

Installation: Install gsutil (pip install gsutil) and project dependencies (pip install -r requirements.txt). Ensure Python 3 is installed.
Dataset: Download the benchmark dataset via Google Cloud Storage or the provided script (python3 ./code/download_physics_iq_data.py). The dataset requires a specific directory structure (full-videos/, split-videos/, switch-frames/, video-masks/).
Video Generation: Follow model-specific instructions for I2V or V2V generation, saving videos in the specified formats.
Video Trimming: Crucially, all generated videos must be trimmed to exactly 5 seconds using ffmpeg (e.g., ffmpeg -y -i "$v" -t 5 -r 24 "generated_videos_5s/$(basename "$v")"). Videos of other durations are incompatible.
Evaluation: Run the evaluation script: python3 code/run_physics_iq.py --input_folders <generated_videos_dirs> --output_folder <output_dir> --descriptions_file <descriptions_file>.
Links: Project website: physics-iq.github.io.

Highlighted Details

Features real-world videos captured with high-quality cameras (3840x2160 resolution, 30 FPS).
Covers a wide array of physical phenomena, including collisions, fluid dynamics, gravity, light, and magnetism.
Includes multiple perspectives (3 angles) and variations for each scenario.
Maintains a public leaderboard with top models achieving scores up to 62.6%.

Maintenance & Community

No specific details regarding community channels, active contributors, or roadmap are provided in the README.

Licensing & Compatibility

Software: Licensed under the Apache License, Version 2.0 (Apache 2.0), which is permissive for commercial use.
Other Materials: Licensed under the Creative Commons Attribution 4.0 International License (CC-BY), requiring attribution.
Compatibility: Apache 2.0 allows integration into closed-source projects. CC-BY requires proper attribution for shared materials.

Limitations & Caveats

Users who accessed the dataset or toolbox before February 19, 2025, are strongly advised to re-download the dataset and re-run evaluations due to improvements and changes in the benchmark. A critical requirement is that all generated videos must be precisely trimmed to five seconds; otherwise, they will be incompatible with the evaluation script.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days