3D object reconstruction from a single image
Top 58.2% on sourcepulse
RealFusion addresses the challenge of generating a complete 360° 3D model of an object from a single input image. It targets researchers and practitioners in computer vision and graphics seeking to reconstruct objects with plausible appearance and shape, even for unseen sides. The primary benefit is achieving state-of-the-art monocular 3D reconstruction results by leveraging diffusion models for novel view synthesis.
How It Works
The method reconstructs a 3D object by fitting a neural radiance field (NeRF) to the input image. To overcome the ill-posed nature of this task, it employs a conditional diffusion model (Stable Diffusion) to "dream up" novel views of the object. By fusing the input view, the diffusion model's prior, and regularization losses, RealFusion generates a consistent 3D reconstruction. This approach is advantageous as it leverages powerful pre-trained generative models to infer unseen object parts and details.
Quick Start & Requirements
pip install -r requirements.txt
.pip install ./raymarching
, pip install ./shencoder
, pip install ./freqencoder
, pip install ./gridencoder
.pip install git+https://github.com/NVlabs/nvdiffrast/
for mesh export.requirements.txt
).scripts/extract-mask.py
is provided for mask extraction.--O
flag for automatic optimization (includes --cuda_ray
).Highlighted Details
nvdiffrast
.Maintenance & Community
The project is based on stable-dreamfusion
and HuggingFace's diffusers
library. The author intends to continue supporting and improving the repository. Contributions via pull requests are welcome.
Licensing & Compatibility
The repository does not explicitly state a license in the README. The underlying models (Stable Diffusion, diffusers) have their own licenses. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The method's performance is sensitive to hyperparameter tuning and may not work well on all images, with failure modes including non-solid objects and geometric distortions (e.g., the Janus problem for faces). The textual inversion step is computationally intensive.
1 year ago
1 day