realfusion by lukemelas

3D object reconstruction from a single image

Created 2 years ago

564 stars

Top 56.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Eric Zhang

Founding Engineer at Modal

Project Summary

RealFusion addresses the challenge of generating a complete 360° 3D model of an object from a single input image. It targets researchers and practitioners in computer vision and graphics seeking to reconstruct objects with plausible appearance and shape, even for unseen sides. The primary benefit is achieving state-of-the-art monocular 3D reconstruction results by leveraging diffusion models for novel view synthesis.

How It Works

The method reconstructs a 3D object by fitting a neural radiance field (NeRF) to the input image. To overcome the ill-posed nature of this task, it employs a conditional diffusion model (Stable Diffusion) to "dream up" novel views of the object. By fusing the input view, the diffusion model's prior, and regularization losses, RealFusion generates a consistent 3D reconstruction. This approach is advantageous as it leverages powerful pre-trained generative models to infer unseen object parts and details.

Quick Start & Requirements

Install dependencies with pip install -r requirements.txt.
Build custom CUDA extensions: pip install ./raymarching, pip install ./shencoder, pip install ./freqencoder, pip install ./gridencoder.
Optional: pip install git+https://github.com/NVlabs/nvdiffrast/ for mesh export.
Requires PyTorch and torchvision (not in requirements.txt).
Input: RGBA image (object + mask). A script scripts/extract-mask.py is provided for mask extraction.
Textual Inversion: Requires a V100 GPU for ~1 hour per run.
Reconstruction: Recommended to use --O flag for automatic optimization (includes --cuda_ray).
Example commands and checkpoints are available.

Highlighted Details

Leverages Stable Diffusion for novel view synthesis.
Implements Textual Inversion to create object-specific embeddings.
Offers extensive hyperparameter tuning for reconstruction quality.
Supports mesh export via nvdiffrast.

Maintenance & Community

The project is based on stable-dreamfusion and HuggingFace's diffusers library. The author intends to continue supporting and improving the repository. Contributions via pull requests are welcome.

Licensing & Compatibility

The repository does not explicitly state a license in the README. The underlying models (Stable Diffusion, diffusers) have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The method's performance is sensitive to hyperparameter tuning and may not work well on all images, with failure modes including non-solid objects and geometric distortions (e.g., the Janus problem for faces). The textual inversion step is computationally intensive.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days