WorldScore by haoyi-duan

Benchmark for world generation models

Created 1 year ago

294 stars

Top 89.7% on SourcePulse

Project Summary

WorldScore offers a unified benchmark for evaluating world generation models, moving beyond single-scene quality metrics. It enables objective comparison of spatial/temporal coherence, scene generation, and dynamic instruction following for generative models.

How It Works

This benchmark assesses generated worlds using spatial and temporal metrics. It differentiates models by analyzing novel scene creation and adherence to camera movements, providing a more robust measure of world generation capabilities than single-scene video quality assessments.

Quick Start & Requirements

Setup requires cloning the repo and configuring environment variables (WORLDSCORE_PATH, MODEL_PATH, DATA_PATH) via a .env file, which must be exported per session. Installation involves separate conda environments for generation (Python 3.10) and evaluation (Python 3.10, CUDA 12.1). Dependencies include PyTorch 2.5.1 (CUDA 12.1), torch-scatter, xformers, suitesparse, open3d, evo, and complex third-party integrations (Droid-SLAM, Grounding-SAM, SAM2, VFIMamba). Dataset download: python download.py. Model registration uses YAML configs and Python classes. Generation: python world_generators/generate_videos.py. Evaluation: python worldscore/run_evaluate.py. Checkpoints require manual download.

Highlighted Details

Differentiates models by evaluating new scene creation and camera path following.
Supports "threedgen", "fourdgen", and "videogen" model types.
Integrates Droid-SLAM, Grounding-SAM, SAM2, and VFIMamba for comprehensive metric calculation.
Paper accepted to ICCV 2025; recent updates include evaluation code for specific models.

Maintenance & Community

The README provides no specific details on maintainers, community channels (e.g., Discord, Slack), or project roadmaps.

Licensing & Compatibility

The license type and compatibility notes for commercial use are not specified in the provided README content.

Limitations & Caveats

Environment setup demands meticulous path configuration and repeated variable exports. The installation of numerous complex dependencies, particularly third-party libraries, is time-consuming and potentially error-prone. Model adaptation currently focuses on video generation, implying potential complexities for 3D/4D models.

Health Check

Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

4 stars in the last 30 days