gcd  by basilevh

Generative model for extreme monocular dynamic novel view synthesis

created 1 year ago
261 stars

Top 98.0% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for "Generative Camera Dolly: Extreme Monocular Dynamic Novel View Synthesis," a method for generating novel views of dynamic scenes from a single camera input. It is targeted at researchers and practitioners in computer vision and graphics interested in monocular video understanding and synthesis. The project offers pretrained models, inference, training, and evaluation code, along with dataset generation tools.

How It Works

The Generative Camera Dolly (GCD) approach leverages Stable Video Diffusion (SVD) models to perform dynamic novel view synthesis. It processes input videos by first converting them into point cloud representations, which are then used to train a diffusion model capable of generating new views. The system supports both gradual camera movements and direct view synthesis, with options for interpolating camera trajectories or directly predicting future frames.

Quick Start & Requirements

  • Installation: Use Conda to create an environment (conda create -n gcd python=3.10), activate it (conda activate gcd), install PyTorch with CUDA 12.1, and then install project dependencies (pip install git+https://github.com/OpenAI/CLIP.git, pip install git+https://github.com/Stability-AI/datapipelines.git, pip install -r requirements.txt).
  • Prerequisites: Python 3.10+, PyTorch 2.0.1+ with CUDA 12.1, and significant disk space (7 TB for Kubric-4D, 4.4 TB for ParallelDomain-4D processed data).
  • Resources: Training requires multiple GPUs (e.g., 8x NVIDIA A100 or A6000) with substantial VRAM (around 50 GB per GPU). Dataset processing also heavily relies on GPUs.
  • Links: Paper, Website, Results, Datasets, Models.

Highlighted Details

  • Achieves up to 23.47 dB PSNR on ParallelDomain-4D (RGB output) and 39.0% mIoU for semantic segmentation.
  • Supports novel view synthesis with up to 180 degrees of horizontal camera displacement.
  • Includes tools for dataset generation using Kubric and processing for ParallelDomain-4D.
  • Offers Gradio-based inference for quick experimentation.

Maintenance & Community

The project is maintained by Basile Van Hoorick and collaborators from Columbia University and Toyota Research Institute. The codebase has been refactored for public release, with a note that thorough vetting is ongoing. Users are encouraged to report issues and suggest fixes.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it depends on Stable Video Diffusion, which has its own licensing terms. Compatibility for commercial use or closed-source linking would require checking the specific licenses of all dependencies, including SVD.

Limitations & Caveats

The codebase has undergone refactoring and may contain undiscovered issues. The project primarily targets synthetic datasets (Kubric-4D, ParallelDomain-4D) and may perform best on similar data, with a note that models may not perform well on videos containing humans. Some dataset folders for ParallelDomain-4D may be missing frames.

Health Check
Last commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
13 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.