stable-virtual-camera by Stability-AI

Novel view synthesis via diffusion model research paper

Created 10 months ago

1,541 stars

Top 26.7% on SourcePulse

View on GitHub

2 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Christian Laforte

Distinguished Engineer at NVIDIA; Former CTO at Stability AI

Project Summary

Stable Virtual Camera (Seva) is a 1.3B parameter diffusion model for Novel View Synthesis (NVS), enabling the generation of 3D-consistent novel views from arbitrary input views and target camera configurations. It is designed for researchers and power users interested in generative view synthesis and 3D scene reconstruction.

How It Works

Seva leverages a generalist diffusion model architecture, trained on a large dataset, to synthesize new views. The core advantage lies in its ability to handle varying numbers of input views and arbitrary target camera poses, offering flexibility beyond traditional NVS methods. This approach allows for high-quality, consistent view generation without requiring explicit 3D scene reconstruction.

Quick Start & Requirements

Install via pip: pip install -e . after cloning the repository.
Requires Python >= 3.10 and PyTorch >= 2.6.0.
Hugging Face authentication is necessary for model weights.
WSL is recommended for Windows users due to Flash Attention limitations.
Official demos: Gradio demo, CLI demo.

Highlighted Details

1.3B parameter generalist diffusion model.
Generates 3D-consistent novel views.
Supports arbitrary input views and target cameras.
Offers both Gradio GUI and CLI interfaces.

Maintenance & Community

The project is associated with Stability AI and the University of Oxford. Discussions regarding training scripts and output licensing are ongoing in GitHub issues.

Licensing & Compatibility

The output is subject to the same non-commercial license as the model. Compatibility with commercial or closed-source applications may be restricted.

Limitations & Caveats

Flash Attention is not supported on native Windows, necessitating WSL. The output license restricts commercial use. Training scripts are still under development via community pull requests.

Health Check

Last Commit

7 months ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

27 stars in the last 30 days