stable-dreamfusion  by ashawkey

Text-to-3D model using NeRF and diffusion

Created 2 years ago
8,720 stars

Top 5.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of DreamFusion, a text-to-3D model leveraging Stable Diffusion. It enables generating 3D models from text prompts and images, with mesh export capabilities. The project targets researchers and developers in 3D content generation and AI art.

How It Works

The core approach replaces Imagen with Stable Diffusion, operating in latent space, which requires backpropagating through the VAE encoder. It utilizes a multi-resolution grid encoder (torch-ngp) for faster NeRF rendering (around 10 FPS at 800x800). Recent updates include support for Perp-Neg to mitigate multi-head issues in text-to-3D generation.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Pre-trained Models: Download Zero-1-to-3 (zero123-xl.ckpt) and Omnidata checkpoints (omnidata_dpt_depth_v2.ckpt, omnidata_dpt_normal_v2.ckpt).
  • Dependencies: PyTorch, diffusers, CUDA (optional for Taichi backend).
  • GPU: Recommended for performance; ~16GB VRAM for Instant-NGP backbone.
  • Docs: Advanced Tips

Highlighted Details

  • Supports Instant-NGP backbone for faster rendering and lower VRAM usage.
  • Offers a CUDA-free Taichi backend for NeRF.
  • Enables image-conditioned 3D generation using Zero-1-to-3.
  • Supports DMTet for mesh finetuning and export.

Maintenance & Community

The project is a work-in-progress with active development noted by recent updates. The primary contributor is Jiaxiang Tang.

Licensing & Compatibility

The repository itself is not explicitly licensed in the README. However, it depends on models and libraries with their own licenses (e.g., Stable Diffusion, Zero-1-to-3, diffusers). Users must adhere to the terms of these underlying components, particularly for commercial use.

Limitations & Caveats

The project is explicitly stated as a work-in-progress with quality not matching the original paper, and many prompts failing. Differences from the original DreamFusion paper are noted, primarily the use of Stable Diffusion instead of Imagen.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
27 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Chaoyu Yang Chaoyu Yang(Founder of Bento), and
11 more.

IF by deep-floyd

0.0%
8k
Text-to-image model for photorealistic synthesis and language understanding
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
6 more.

threestudio by threestudio-project

0.2%
7k
Framework for 3D content generation from text/images using 2D diffusion
Created 2 years ago
Updated 9 months ago
Feedback? Help us improve.