Generative model for extreme monocular dynamic novel view synthesis
Top 98.0% on sourcepulse
This repository provides the official implementation for "Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis," a method for generating novel views of dynamic scenes from a single camera input. It is targeted at researchers and practitioners in computer vision and graphics interested in monocular video understanding and synthesis. The project offers pretrained models, inference, training, and evaluation code, along with dataset generation tools.
How It Works
The Generative Camera Dolly (GCD) approach leverages Stable Video Diffusion (SVD) models to perform dynamic novel view synthesis. It processes input videos by first converting them into point cloud representations, which are then used to train a diffusion model capable of generating new views. The system supports both gradual camera movements and direct view synthesis, with options for interpolating camera trajectories or directly predicting future frames.
Quick Start & Requirements
conda create -n gcd python=3.10
), activate it (conda activate gcd
), install PyTorch with CUDA 12.1, and then install project dependencies (pip install git+https://github.com/OpenAI/CLIP.git
, pip install git+https://github.com/Stability-AI/datapipelines.git
, pip install -r requirements.txt
).Highlighted Details
Maintenance & Community
The project is maintained by Basile Van Hoorick and collaborators from Columbia University and Toyota Research Institute. The codebase has been refactored for public release, with a note that thorough vetting is ongoing. Users are encouraged to report issues and suggest fixes.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. However, it depends on Stable Video Diffusion, which has its own licensing terms. Compatibility for commercial use or closed-source linking would require checking the specific licenses of all dependencies, including SVD.
Limitations & Caveats
The codebase has undergone refactoring and may contain undiscovered issues. The project primarily targets synthetic datasets (Kubric-4D, ParallelDomain-4D) and may perform best on similar data, with a note that models may not perform well on videos containing humans. Some dataset folders for ParallelDomain-4D may be missing frames.
9 months ago
Inactive