zero123  by cvlab-columbia

Research paper for zero-shot one image to 3D object generation

created 2 years ago
2,924 stars

Top 16.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides Zero-1-to-3, a novel method for generating 3D objects from a single input image. It addresses the challenge of zero-shot novel view synthesis and 3D reconstruction, targeting researchers and developers in computer vision and graphics. The primary benefit is enabling high-quality 3D asset creation from minimal input.

How It Works

Zero-1-to-3 leverages a finetuned Stable Diffusion model to generate novel views of an object from a single input image. It explicitly models camera pose changes, trained on a large dataset of 3D object renderings (Objaverse). This approach alleviates the "Janus problem" (viewpoint ambiguity) inherent in text-to-image models by ensuring consistency and accuracy across synthesized viewpoints, facilitating 3D reconstruction.

Quick Start & Requirements

  • Install: Clone repo, create conda env (conda create -n zero123 python=3.9, conda activate zero123), pip install -r requirements.txt, pip install -e taming-transformers/, pip install -e CLIP/.
  • Dependencies: Python 3.9, PyTorch, CUDA. Requires downloading checkpoint weights (e.g., 105000.ckpt).
  • Resources: Demo requires ~22GB VRAM (RTX 3090/4090). Training script is optimized for an 8x A100 (80GB VRAM) system.
  • Links: Project Page, Live Demo, Weights.

Highlighted Details

  • Zero-shot novel view synthesis and 3D reconstruction from a single image.
  • Addresses the Janus problem through explicit camera pose modeling and large-scale dataset training.
  • Integrates with other projects like Threestudio and Stable-Dreamfusion for 3D reconstruction pipelines.
  • Offers multiple checkpoint weights trained for different iteration counts.

Maintenance & Community

  • Developed by Columbia University and Toyota Research Institute.
  • Mentions integration with Stability AI's Threestudio.
  • No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • Dataset (Objaverse Renderings) is released under ODC-By 1.0 license. Individual object licenses follow Objaverse's creative commons licenses.
  • Code is based on Stable Diffusion, Objaverse, and SJC; licensing for these underlying components should be considered for commercial use.

Limitations & Caveats

The training script is preliminary and configured for an 8x A100 system, requiring adjustments for smaller GPU setups. Hyperparameters for 3D reconstruction are not extensively tuned.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
49 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.