Make-It-3D  by junshutang

3D creation from a single image using diffusion prior (ICCV 2023)

created 2 years ago
1,867 stars

Top 23.7% on sourcepulse

GitHubView on GitHub
Project Summary

Make-It-3D addresses the challenge of generating high-fidelity 3D models from a single 2D image, a task complicated by the need to infer unseen geometry and textures. It targets researchers and developers in computer graphics and AI, offering a novel approach that leverages diffusion models for 3D-aware supervision, enabling applications like text-to-3D creation.

How It Works

The method employs a two-stage optimization pipeline. The first stage optimizes a neural radiance field (NeRF) using constraints from the input image and a diffusion prior for novel views. The second stage refines this coarse model into textured point clouds, further enhancing realism with the diffusion prior and high-quality textures from the original image. This diffusion prior acts as a powerful 3D-aware regularizer, guiding the reconstruction of unseen parts.

Quick Start & Requirements

  • Installation: pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html followed by several other pip install git+... commands for dependencies like tiny-cuda-nn, CLIP, diffusers, huggingface_hub, pytorch3d, and contextual_loss_pytorch. Also requires requirements.txt and raymarching.
  • Prerequisites: CUDA 11.3, Python 3.x, Hugging Face token for Stable Diffusion access. Requires pre-trained weights for DPT (depth estimation) and Segment Anything Model (SAM).
  • Setup: Requires cloning DPT and downloading weights. Estimated setup time is moderate due to multiple complex dependencies and model downloads.
  • Links: Project page: https://make-it-3d.github.io/

Highlighted Details

  • ICCV 2023 publication.
  • Leverages Stable Diffusion 2.0 as a diffusion prior.
  • Utilizes DPT for depth estimation and SAM for masking.
  • Supports text-conditioned 3D creation and texture editing.

Maintenance & Community

The project is associated with ICCV 2023. A Jittor implementation is also available. The README lists planned features (now completed) and acknowledges borrowing heavily from Stable-Dreamfusion.

Licensing & Compatibility

The repository does not explicitly state a license. The dependencies include libraries with various licenses (e.g., PyTorch, Hugging Face libraries). Commercial use may require careful review of all dependency licenses.

Limitations & Caveats

The method is noted to be challenging for complex scenes or images not featuring a single, centered object, potentially struggling with solid geometry reconstruction in such cases.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
22 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
7 more.

stable-dreamfusion by ashawkey

0.1%
9k
Text-to-3D model using NeRF and diffusion
created 2 years ago
updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
12 more.

stablediffusion by Stability-AI

0.1%
41k
Latent diffusion model for high-resolution image synthesis
created 2 years ago
updated 1 month ago
Feedback? Help us improve.