Generative models SDK for video, image, and 3D synthesis research
Top 1.6% on sourcepulse
This repository provides Stability AI's generative models, including Stable Diffusion XL (SDXL) for text-to-image and Stable Video Diffusion (SVD) for image-to-video synthesis. It also features newer models like SDXL-Turbo and SV3D for faster generation and multi-view synthesis, targeting researchers and developers in AI art and video generation.
How It Works
The core architecture is config-driven, allowing modular assembly of submodules via instantiate_from_config()
. It adopts a "denoiser framework" for both training and inference, unifying discrete and continuous time models. Conditioning inputs (text, class, spatial) are handled by a single GeneralConditioner
class, separating guiders from samplers for increased flexibility.
Quick Start & Requirements
pip install .
(after cloning and setting up a Python 3.10 virtual environment with pip install -r requirements/pt2.txt
).sv4d.safetensors
, sv3d_u.safetensors
) to a checkpoints/
directory.python scripts/sampling/simple_video_sample_4d.py --input_path <path/to/video>
.scripts/demo/
.Highlighted Details
SV3D_u
(orbital videos) and SV3D_p
(specified camera paths) variants.Maintenance & Community
The project is actively developed by Stability AI. Links to project pages, tech reports, and video summaries are provided for specific models.
Licensing & Compatibility
Models are released under various licenses, including CreativeML Open RAIL++-M and research-specific licenses requiring application for access (e.g., SDXL-0.9). Compatibility for commercial use may be restricted by these licenses.
Limitations & Caveats
The README notes potential Python version conflicts and that some training configurations require specific dataset formats (webdataset) and manual edits. Autoencoder training is limited to PyTorch 1.13. Access to certain model weights requires application and approval.
2 months ago
1 week