sjc  by pals-ttic

Research paper for 3D generation from 2D diffusion models

created 2 years ago
517 stars

Top 61.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository implements Score Jacobian Chaining (SJC), a method for generating 3D assets by leveraging pretrained 2D diffusion models. It targets researchers and practitioners in computer vision and graphics interested in 3D generation from 2D priors, offering a novel approach to adapt powerful 2D models for 3D tasks.

How It Works

SJC applies the chain rule to a diffusion model's learned score function, backpropagating it through the Jacobian of a differentiable renderer (specifically, a voxel radiance field). This process aggregates 2D scores from multiple viewpoints into a unified 3D score, enabling 3D data generation using existing 2D models. A key innovation is a novel estimation mechanism to address the distribution mismatch inherent in this cross-domain adaptation.

Quick Start & Requirements

  • Install: Follow PyTorch installation for your CUDA version (e.g., pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116), then pip install -r requirements.txt. Manually install taming-transformers (git clone --depth 1 git@github.com:CompVis/taming-transformers.git && pip install -e taming-transformers).
  • Checkpoints: Download a 12GB tar file containing necessary checkpoints (SD v1.5, gddpm). Update env.json to point to the uncompressed files.
  • Usage: Run experiments from a dedicated directory (e.g., mkdir exp && cd exp). A sample generation command is python /path/to/sjc/run_sjc.py --sd.prompt "A zoomed out high quality photo of Temple of Heaven" --n_steps 10000 --lr 0.05 --sd.scale 100.0.
  • Resources: Generation takes ~25 minutes and 10GB GPU memory on an A5000 for 10,000 steps. High-resolution visualization requires ~5 minutes and 11GB on an A5000.
  • Docs: Usage examples and reproduction scripts are provided in the README.

Highlighted Details

  • Integrates with threestudio.
  • Includes implementations of Karras sampler and a voxel NeRF.
  • Offers a subpixel rendering script for higher quality visualizations.
  • Provides detailed example commands for generating various 3D assets (e.g., Trump, Temple of Heaven, School Bus).

Maintenance & Community

  • The project is associated with CVPR 2023.
  • Mentions integration into threestudio.
  • No specific community links (Discord/Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

  • Released under Stable Diffusion's OpenRAIL license due to its use of SD.
  • No other restrictive licensing components are identified.

Limitations & Caveats

  • Seeds are currently hardcoded to 0.
  • Scripts to reproduce 2D experiments (Fig 4) are pending.
  • Main paper figures are not yet consistent with appendix figures (which used subpixel rendering).
  • DreamBooth integration is noted as not ready, with potential issues like multi-face generation and guidance scale tuning.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
7 more.

stable-dreamfusion by ashawkey

0.1%
9k
Text-to-3D model using NeRF and diffusion
created 2 years ago
updated 1 year ago
Feedback? Help us improve.