sjc  by pals-ttic

Research paper for 3D generation from 2D diffusion models

Created 2 years ago
518 stars

Top 60.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository implements Score Jacobian Chaining (SJC), a method for generating 3D assets by leveraging pretrained 2D diffusion models. It targets researchers and practitioners in computer vision and graphics interested in 3D generation from 2D priors, offering a novel approach to adapt powerful 2D models for 3D tasks.

How It Works

SJC applies the chain rule to a diffusion model's learned score function, backpropagating it through the Jacobian of a differentiable renderer (specifically, a voxel radiance field). This process aggregates 2D scores from multiple viewpoints into a unified 3D score, enabling 3D data generation using existing 2D models. A key innovation is a novel estimation mechanism to address the distribution mismatch inherent in this cross-domain adaptation.

Quick Start & Requirements

  • Install: Follow PyTorch installation for your CUDA version (e.g., pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116), then pip install -r requirements.txt. Manually install taming-transformers (git clone --depth 1 git@github.com:CompVis/taming-transformers.git && pip install -e taming-transformers).
  • Checkpoints: Download a 12GB tar file containing necessary checkpoints (SD v1.5, gddpm). Update env.json to point to the uncompressed files.
  • Usage: Run experiments from a dedicated directory (e.g., mkdir exp && cd exp). A sample generation command is python /path/to/sjc/run_sjc.py --sd.prompt "A zoomed out high quality photo of Temple of Heaven" --n_steps 10000 --lr 0.05 --sd.scale 100.0.
  • Resources: Generation takes ~25 minutes and 10GB GPU memory on an A5000 for 10,000 steps. High-resolution visualization requires ~5 minutes and 11GB on an A5000.
  • Docs: Usage examples and reproduction scripts are provided in the README.

Highlighted Details

  • Integrates with threestudio.
  • Includes implementations of Karras sampler and a voxel NeRF.
  • Offers a subpixel rendering script for higher quality visualizations.
  • Provides detailed example commands for generating various 3D assets (e.g., Trump, Temple of Heaven, School Bus).

Maintenance & Community

  • The project is associated with CVPR 2023.
  • Mentions integration into threestudio.
  • No specific community links (Discord/Slack) or active maintenance signals are provided in the README.

Licensing & Compatibility

  • Released under Stable Diffusion's OpenRAIL license due to its use of SD.
  • No other restrictive licensing components are identified.

Limitations & Caveats

  • Seeds are currently hardcoded to 0.
  • Scripts to reproduce 2D experiments (Fig 4) are pending.
  • Main paper figures are not yet consistent with appendix figures (which used subpixel rendering).
  • DreamBooth integration is noted as not ready, with potential issues like multi-face generation and guidance scale tuning.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Luis Capelo Luis Capelo(Cofounder of Lightning AI), and
6 more.

threestudio by threestudio-project

0.2%
7k
Framework for 3D content generation from text/images using 2D diffusion
Created 2 years ago
Updated 9 months ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
13 more.

stable-dreamfusion by ashawkey

0.1%
9k
Text-to-3D model using NeRF and diffusion
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.