diffusion-renderer by nv-tlabs

Video diffusion for neural inverse and forward rendering

Created 4 months ago

294 stars

Top 89.9% on SourcePulse

Project Summary

Summary

DiffusionRenderer offers a framework for high-quality geometry and material estimation from videos (inverse rendering) and photorealistic synthesis from G-buffers (forward rendering). It targets computer vision and graphics researchers/engineers, providing a data-driven approach via video diffusion models to approximate light transport. This enables realistic relighting and material editing without explicit simulation, especially beneficial for imprecise or unavailable geometry.

How It Works

The system employs video diffusion models for both inverse (scene attribute estimation) and forward (image/video synthesis) rendering. It approximates light transport data-drivenly, generating realistic lighting effects without traditional path tracing or precise geometry. Trained on synthetic and auto-labeled real-world videos, this approach offers an advantage over classic PBR for scenarios with challenging geometry.

Quick Start & Requirements

Installation requires Python 3.10, PyTorch (v2.1-2.4) with CUDA 12.1, and project dependencies (pip install -r requirements.txt). Model weights are available on Hugging Face via download scripts. Inference scripts (inference_svd_rgbx.py for inverse, inference_svd_xrgb.py for forward) are provided, needing configuration files and input data. High-end GPUs (e.g., A100 80GB, RTX 4090 24GB) are recommended, with memory-saving options available.

Highlighted Details

Official implementation for the CVPR'25 Oral paper "DiffusionRenderer: Neural Inverse and Forward Rendering with Video Diffusion Models".
Features a "Cosmos DiffusionRenderer" update powered by NVIDIA Cosmos, promising significant quality improvements.
Serves as a general-purpose framework for both inverse and forward rendering tasks.

Maintenance & Community

Recent updates include the "Cosmos DiffusionRenderer" and upcoming code releases. No specific community channels or detailed maintenance roadmaps are detailed in the provided README.

Licensing & Compatibility

The project uses the Nvidia Source Code License for its code/models, with the base model under the Stability AI Community License. Users must review both for compatibility, especially regarding commercial use or integration into closed-source applications, as restrictions may apply.

Limitations & Caveats

Inference demands substantial GPU memory (over 22 GB even with optimizations like FP16 and VAE chunking). CPU offloading is possible but not recommended due to performance impacts. The provided code is for the academic version; newer enhanced versions are in development.

diffusion-renderer by nv-tlabs

Explore Similar Projects

EasyCache by H-EmbodVis

4DNeX by 3DTopia

physgen by stevenlsw

vid2vid-zero by baaivision

autovfx by haoyuhsu

realfusion by lukemelas

Paint3D by OpenTexture

mixture-of-diffusers by albarji

Make-It-3D by junshutang

zero123 by cvlab-columbia

LLFF by Fyusion

Tune-A-Video by showlab