Novel view synthesis via diffusion model research paper
Top 29.6% on sourcepulse
Stable Virtual Camera (Seva) is a 1.3B parameter diffusion model for Novel View Synthesis (NVS), enabling the generation of 3D-consistent novel views from arbitrary input views and target camera configurations. It is designed for researchers and power users interested in generative view synthesis and 3D scene reconstruction.
How It Works
Seva leverages a generalist diffusion model architecture, trained on a large dataset, to synthesize new views. The core advantage lies in its ability to handle varying numbers of input views and arbitrary target camera poses, offering flexibility beyond traditional NVS methods. This approach allows for high-quality, consistent view generation without requiring explicit 3D scene reconstruction.
Quick Start & Requirements
pip install -e .
after cloning the repository.Highlighted Details
Maintenance & Community
The project is associated with Stability AI and the University of Oxford. Discussions regarding training scripts and output licensing are ongoing in GitHub issues.
Licensing & Compatibility
The output is subject to the same non-commercial license as the model. Compatibility with commercial or closed-source applications may be restricted.
Limitations & Caveats
Flash Attention is not supported on native Windows, necessitating WSL. The output license restricts commercial use. Training scripts are still under development via community pull requests.
1 month ago
1 day