Discover and explore top open-source AI tools and projects—updated daily.
EnVision-ResearchDeterministic video depth estimation framework
Top 93.4% on SourcePulse
Deterministic Video Depth (DVD) addresses the trade-off between semantic ambiguity in discriminative video depth estimation models and temporal flickering/hallucinations in generative models. It offers a deterministic framework by adapting pre-trained video diffusion models into single-pass depth regressors. This approach benefits researchers and practitioners seeking highly detailed, temporally stable, and data-efficient video depth estimation solutions.
How It Works
DVD fundamentally breaks the ambiguity-hallucination dilemma by adapting generative video diffusion models into deterministic depth regressors. It achieves this by stripping away generative stochasticity, thereby uniting the semantic priors of diffusion models with the structural stability of discriminative methods. Key innovations include Latent Manifold Rectification (LMR) for unparalleled structural fidelity and boundary precision, and a training-free Global Affine Coherence (GAC) module for seamless long-video inference with minimal scale drift.
Quick Start & Requirements
Installation involves cloning the repository, creating a Conda environment with Python 3.10, and installing the package in editable mode (pip install -e .). For speedup, pip install sageattention is recommended (but not for training). Pre-trained weights must be downloaded from Hugging Face using huggingface-cli login and huggingface-cli download. Links to the project page, arXiv paper, and Hugging Face model/demo are provided.
Highlighted Details
Maintenance & Community
The project has seen recent activity with the release of its paper, project page, pre-trained weights, and inference code. A ComfyUI node has been contributed by the community, indicating active development and integration efforts.
Licensing & Compatibility
DVD employs a split-license strategy: the source code is released under the permissive Apache 2.0 License. However, the pre-trained model weights are distributed under the CC BY-NC 4.0 License, which strictly restricts usage to non-commercial, academic, and research purposes.
Limitations & Caveats
The primary limitation is the non-commercial restriction imposed by the CC BY-NC 4.0 license on the pre-trained model weights, precluding their use in commercial applications. The online Gradio demo is limited to processing videos up to 5 seconds due to GPU resource constraints. Potential installation issues may arise from dependencies like PyTorch, sentencepiece, cmake, and cupy.
4 days ago
Inactive
hao-ai-lab