ml-hypersim by apple

Synthetic dataset for holistic indoor scene understanding research

Created 5 years ago

1,968 stars

Top 21.9% on SourcePulse

View on GitHub

6 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Thomas Wolf

Cofounder of Hugging Face

Alex Yu

Research Scientist at OpenAI; Cofounder of Luma AI

Noah Snavely

Research Scientist at Google DeepMind; Professor at Cornell Tech

and 2 more!

Project Summary

Hypersim is a large-scale, photorealistic synthetic dataset designed for comprehensive indoor scene understanding tasks. It provides detailed per-pixel ground truth labels for geometry, semantics, and materials, targeting researchers and engineers in computer vision and robotics. The dataset enables robust training and evaluation of models that require precise scene information, which is often difficult to obtain from real-world data.

How It Works

Hypersim leverages professional artist-created 3D scenes from Evermotion Archinteriors, rendering over 77,000 images across 461 indoor environments. It offers a unique factorization of each image into diffuse reflectance, diffuse illumination, and a non-diffuse residual term, facilitating advanced lighting and material analysis. The dataset includes dense per-pixel semantic instance segmentations, camera information, and 3D bounding boxes for semantic instances, all derived from the underlying scene geometry and V-Ray rendering pipeline.

Quick Start & Requirements

Dataset Download: Requires a script (dataset_download_images.py) and approximately 1.9TB of storage.
Toolkit Setup:
- Python 3.7+ (Anaconda recommended).
- Python libraries: h5py, matplotlib, pandas, scikit-learn, mayavi (optional), opencv-python, pillow, joblib, scipy.
- V-Ray Standalone and V-Ray AppSDK (Next Standalone, update 2.1 or later).
- C++ libraries: args, Armadillo, Embree, HDF5, Octomap, OpenEXR (for High-Level Toolkit).
Configuration: Requires modifying system configuration files (_system_config.py, system_config.inc) to point to installed V-Ray and library paths.
Rendering: Cloud rendering is utilized for generating the dataset.
Documentation: Tutorials and detailed instructions are provided within the repository.

Highlighted Details

Provides 77,400 images (74,619 in public release) with 461 indoor scenes.
Includes dense per-pixel semantic instance segmentation and 3D bounding boxes.
Images are factorized into diffuse reflectance, diffuse illumination, and residual components.
Offers lossless HDR image data and lossy preview images.

Maintenance & Community

The project is from Apple, with contributions noted from individuals like Mike Roberts, Jason Ramapuram, and Anurag Ranjan. Further community interaction details are not explicitly provided in the README.

Licensing & Compatibility

The Hypersim Dataset is licensed under the Creative Commons Attribution-ShareAlike 3.0 Unported License. The software toolkit's licensing is not explicitly detailed, but it notes that it does not depend on GPL-licensed portions of its open-source dependencies. Commercial use of the dataset is permitted under the CC-BY-SA 3.0 license.

Limitations & Caveats

The dataset requires significant storage (1.9TB) and a complex setup involving V-Ray. Some pipeline steps are platform-specific (Windows, macOS/Linux). The dataset may contain asset reuse across scenes, potentially affecting strict data independence in splits. Purchasing original 3D assets is required for obtaining ground truth triangle meshes.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

11 stars in the last 30 days