Neural_Gaffer by Haian-Jin

2D relighting diffusion model for single-image object relighting

Created 1 year ago

331 stars

Top 82.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Noah Snavely

Research Scientist at Google DeepMind; Professor at Cornell Tech

Project Summary

Neural Gaffer is an end-to-end 2D relighting diffusion model that accurately relights any object in a single image under various lighting conditions. It targets researchers and practitioners in computer vision and graphics, enabling applications like text-based relighting, object insertion, and serving as a prior for 3D relighting tasks.

How It Works

The model employs a diffusion-based approach to relight objects. It takes a single input image and target lighting conditions to generate relit versions. A key aspect is its ability to act as a prior for 3D relighting, directly relighting radiance fields without requiring inverse rendering, which is a novel application of diffusion models in this domain.

Quick Start & Requirements

Installation: Use conda to create an environment, activate it, and then pip install -r requirements.txt and pip3 install -U xformers==0.0.28 --index-url https://download.pytorch.org/whl/cu118.
Prerequisites: Python 3.9, CUDA 11.8 (for xformers), PyTorch. Requires downloading checkpoints and datasets.
Resources: Inference for 2D relighting on a single A6000 GPU takes ~20 minutes for 2,400 images. 3D relighting inference takes ~24 minutes for 2,888 images. Training requires 8 GPUs.
Links: Project Page, Paper

Highlighted Details

End-to-end 2D relighting of single images.
Enables text-based relighting and object insertion.
Acts as a prior for 3D radiance field relighting.
Official code for NeurIPS 2024 paper.

Maintenance & Community

The project is associated with NeurIPS 2024. The primary contributor is Haian Jin. The README indicates a TODO list for future releases.

Licensing & Compatibility

The repository does not explicitly state a license. The codebase is built on top of Zero123-HF.

Limitations & Caveats

The model was trained at a 256x256 resolution, which limits its ability to preserve fine details and can lead to relighting failures. The VAE used in the base diffusion model struggles with identity preservation for detailed objects at this resolution. Finetuning at higher resolutions or using a more powerful diffusion model is suggested to mitigate this.

Health Check

Last Commit

7 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days