stable-diffusion-webui-depthmap-script  by thygate

WebUI script for depth map creation and manipulation

created 2 years ago
1,815 stars

Top 24.3% on sourcepulse

GitHubView on GitHub
Project Summary

This addon for AUTOMATIC1111's Stable Diffusion WebUI generates high-resolution depth maps, normal maps, and 3D stereo image pairs from images. It targets users of Stable Diffusion WebUI and 3D content creators, enabling advanced image manipulation and 3D asset generation.

How It Works

The script leverages multiple state-of-the-art monocular depth estimation models, including Marigold, MiDaS, and ZoeDepth, to produce realistic depth maps. It employs multi-resolution merging (BoostingMonocularDepth) for enhanced detail and resolution. For 3D mesh generation, it utilizes Context-aware Layered Depth Inpainting, allowing for video rendering from the generated 3D scene.

Quick Start & Requirements

  • Installation: Install as an extension within Stable Diffusion WebUI (Extensions tab -> Available -> Load from -> install Depth Maps) or from the URL: https://github.com/thygate/stable-diffusion-webui-depthmap-script.
  • Dependencies: Requires Stable Diffusion WebUI. Models are downloaded automatically. GPU acceleration is supported and recommended for performance.
  • Standalone: Clone the repo, install requirements.txt, and run main.py.
  • Documentation: Wiki

Highlighted Details

  • Supports depth map generation using Marigold, MiDaS (multiple variants), ZoeDepth, and LeReS.
  • Enables creation of 3D stereo image pairs (side-by-side, anaglyph) and normal maps.
  • Integrates with Rembg for background removal and supports video processing.
  • Generates 3D meshes using 3D-Photo-Inpainting for video rendering.

Maintenance & Community

  • Actively developed with contributions from multiple users.
  • Community interaction via GitHub Issues and Discussions.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Users should verify licensing for underlying models and code.

Limitations & Caveats

  • Stereoscopic image generation is CPU-bound.
  • 3D mesh generation can be time-consuming, potentially taking up to an hour for large images.
  • The README mentions potential VRAM usage issues with certain models and the "Boost" feature, recommending parameter adjustments for lower VRAM GPUs.
Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
32 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
7 more.

stable-dreamfusion by ashawkey

0.1%
9k
Text-to-3D model using NeRF and diffusion
created 2 years ago
updated 1 year ago
Feedback? Help us improve.