Make-It-3D  by junshutang

3D creation from a single image using diffusion prior (ICCV 2023)

Created 2 years ago
1,877 stars

Top 23.2% on SourcePulse

GitHubView on GitHub
Project Summary

Make-It-3D addresses the challenge of generating high-fidelity 3D models from a single 2D image, a task complicated by the need to infer unseen geometry and textures. It targets researchers and developers in computer graphics and AI, offering a novel approach that leverages diffusion models for 3D-aware supervision, enabling applications like text-to-3D creation.

How It Works

The method employs a two-stage optimization pipeline. The first stage optimizes a neural radiance field (NeRF) using constraints from the input image and a diffusion prior for novel views. The second stage refines this coarse model into textured point clouds, further enhancing realism with the diffusion prior and high-quality textures from the original image. This diffusion prior acts as a powerful 3D-aware regularizer, guiding the reconstruction of unseen parts.

Quick Start & Requirements

  • Installation: pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio===0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html followed by several other pip install git+... commands for dependencies like tiny-cuda-nn, CLIP, diffusers, huggingface_hub, pytorch3d, and contextual_loss_pytorch. Also requires requirements.txt and raymarching.
  • Prerequisites: CUDA 11.3, Python 3.x, Hugging Face token for Stable Diffusion access. Requires pre-trained weights for DPT (depth estimation) and Segment Anything Model (SAM).
  • Setup: Requires cloning DPT and downloading weights. Estimated setup time is moderate due to multiple complex dependencies and model downloads.
  • Links: Project page: https://make-it-3d.github.io/

Highlighted Details

  • ICCV 2023 publication.
  • Leverages Stable Diffusion 2.0 as a diffusion prior.
  • Utilizes DPT for depth estimation and SAM for masking.
  • Supports text-conditioned 3D creation and texture editing.

Maintenance & Community

The project is associated with ICCV 2023. A Jittor implementation is also available. The README lists planned features (now completed) and acknowledges borrowing heavily from Stable-Dreamfusion.

Licensing & Compatibility

The repository does not explicitly state a license. The dependencies include libraries with various licenses (e.g., PyTorch, Hugging Face libraries). Commercial use may require careful review of all dependency licenses.

Limitations & Caveats

The method is noted to be challenging for complex scenes or images not featuring a single, centered object, potentially struggling with solid geometry reconstruction in such cases.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
13 more.

stable-dreamfusion by ashawkey

0.1%
9k
Text-to-3D model using NeRF and diffusion
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.