zero123 by cvlab-columbia

Research paper for zero-shot one image to 3D object generation

Created 2 years ago

3,010 stars

Top 15.8% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

This repository provides Zero-1-to-3, a novel method for generating 3D objects from a single input image. It addresses the challenge of zero-shot novel view synthesis and 3D reconstruction, targeting researchers and developers in computer vision and graphics. The primary benefit is enabling high-quality 3D asset creation from minimal input.

How It Works

Zero-1-to-3 leverages a finetuned Stable Diffusion model to generate novel views of an object from a single input image. It explicitly models camera pose changes, trained on a large dataset of 3D object renderings (Objaverse). This approach alleviates the "Janus problem" (viewpoint ambiguity) inherent in text-to-image models by ensuring consistency and accuracy across synthesized viewpoints, facilitating 3D reconstruction.

Quick Start & Requirements

Install: Clone repo, create conda env (conda create -n zero123 python=3.9, conda activate zero123), pip install -r requirements.txt, pip install -e taming-transformers/, pip install -e CLIP/.
Dependencies: Python 3.9, PyTorch, CUDA. Requires downloading checkpoint weights (e.g., 105000.ckpt).
Resources: Demo requires ~22GB VRAM (RTX 3090/4090). Training script is optimized for an 8x A100 (80GB VRAM) system.
Links: Project Page, Live Demo, Weights.

Highlighted Details

Zero-shot novel view synthesis and 3D reconstruction from a single image.
Addresses the Janus problem through explicit camera pose modeling and large-scale dataset training.
Integrates with other projects like Threestudio and Stable-Dreamfusion for 3D reconstruction pipelines.
Offers multiple checkpoint weights trained for different iteration counts.

Maintenance & Community

Developed by Columbia University and Toyota Research Institute.
Mentions integration with Stability AI's Threestudio.
No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

Dataset (Objaverse Renderings) is released under ODC-By 1.0 license. Individual object licenses follow Objaverse's creative commons licenses.
Code is based on Stable Diffusion, Objaverse, and SJC; licensing for these underlying components should be considered for commercial use.

Limitations & Caveats

The training script is preliminary and configured for an 8x A100 system, requiring adjustments for smaller GPU setups. Hyperparameters for 3D reconstruction are not extensively tuned.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

23 stars in the last 30 days