Text-to-3D model using NeRF and diffusion
Top 6.0% on sourcepulse
This repository provides a PyTorch implementation of DreamFusion, a text-to-3D model leveraging Stable Diffusion. It enables generating 3D models from text prompts and images, with mesh export capabilities. The project targets researchers and developers in 3D content generation and AI art.
How It Works
The core approach replaces Imagen with Stable Diffusion, operating in latent space, which requires backpropagating through the VAE encoder. It utilizes a multi-resolution grid encoder (torch-ngp) for faster NeRF rendering (around 10 FPS at 800x800). Recent updates include support for Perp-Neg to mitigate multi-head issues in text-to-3D generation.
Quick Start & Requirements
pip install -r requirements.txt
zero123-xl.ckpt
) and Omnidata checkpoints (omnidata_dpt_depth_v2.ckpt
, omnidata_dpt_normal_v2.ckpt
).Highlighted Details
Maintenance & Community
The project is a work-in-progress with active development noted by recent updates. The primary contributor is Jiaxiang Tang.
Licensing & Compatibility
The repository itself is not explicitly licensed in the README. However, it depends on models and libraries with their own licenses (e.g., Stable Diffusion, Zero-1-to-3, diffusers). Users must adhere to the terms of these underlying components, particularly for commercial use.
Limitations & Caveats
The project is explicitly stated as a work-in-progress with quality not matching the original paper, and many prompts failing. Differences from the original DreamFusion paper are noted, primarily the use of Stable Diffusion instead of Imagen.
1 year ago
1 day