realfusion  by lukemelas

3D object reconstruction from a single image

created 2 years ago
560 stars

Top 58.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

RealFusion addresses the challenge of generating a complete 360° 3D model of an object from a single input image. It targets researchers and practitioners in computer vision and graphics seeking to reconstruct objects with plausible appearance and shape, even for unseen sides. The primary benefit is achieving state-of-the-art monocular 3D reconstruction results by leveraging diffusion models for novel view synthesis.

How It Works

The method reconstructs a 3D object by fitting a neural radiance field (NeRF) to the input image. To overcome the ill-posed nature of this task, it employs a conditional diffusion model (Stable Diffusion) to "dream up" novel views of the object. By fusing the input view, the diffusion model's prior, and regularization losses, RealFusion generates a consistent 3D reconstruction. This approach is advantageous as it leverages powerful pre-trained generative models to infer unseen object parts and details.

Quick Start & Requirements

  • Install dependencies with pip install -r requirements.txt.
  • Build custom CUDA extensions: pip install ./raymarching, pip install ./shencoder, pip install ./freqencoder, pip install ./gridencoder.
  • Optional: pip install git+https://github.com/NVlabs/nvdiffrast/ for mesh export.
  • Requires PyTorch and torchvision (not in requirements.txt).
  • Input: RGBA image (object + mask). A script scripts/extract-mask.py is provided for mask extraction.
  • Textual Inversion: Requires a V100 GPU for ~1 hour per run.
  • Reconstruction: Recommended to use --O flag for automatic optimization (includes --cuda_ray).
  • Example commands and checkpoints are available.

Highlighted Details

  • Leverages Stable Diffusion for novel view synthesis.
  • Implements Textual Inversion to create object-specific embeddings.
  • Offers extensive hyperparameter tuning for reconstruction quality.
  • Supports mesh export via nvdiffrast.

Maintenance & Community

The project is based on stable-dreamfusion and HuggingFace's diffusers library. The author intends to continue supporting and improving the repository. Contributions via pull requests are welcome.

Licensing & Compatibility

The repository does not explicitly state a license in the README. The underlying models (Stable Diffusion, diffusers) have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The method's performance is sensitive to hyperparameter tuning and may not work well on all images, with failure modes including non-solid objects and geometric distortions (e.g., the Janus problem for faces). The textual inversion step is computationally intensive.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
4 more.

taming-transformers by CompVis

0.1%
6k
Image synthesis research paper using transformers
created 4 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Feedback? Help us improve.