generative-models by Stability-AI

Generative models SDK for video, image, and 3D synthesis research

Created 2 years ago

26,800 stars

Top 1.4% on SourcePulse

View on GitHub

10 Experts Love This Project

Jason Huggins

Creator of Selenium

Luis Capelo

Cofounder of Lightning AI

Pawel Garbacki

Cofounder of Fireworks AI

Omar Sanseviero

DevRel at Google DeepMind

and 6 more!

Project Summary

This repository provides Stability AI's generative models, including Stable Diffusion XL (SDXL) for text-to-image and Stable Video Diffusion (SVD) for image-to-video synthesis. It also features newer models like SDXL-Turbo and SV3D for faster generation and multi-view synthesis, targeting researchers and developers in AI art and video generation.

How It Works

The core architecture is config-driven, allowing modular assembly of submodules via instantiate_from_config(). It adopts a "denoiser framework" for both training and inference, unifying discrete and continuous time models. Conditioning inputs (text, class, spatial) are handled by a single GeneralConditioner class, separating guiders from samplers for increased flexibility.

Quick Start & Requirements

Install: pip install . (after cloning and setting up a Python 3.10 virtual environment with pip install -r requirements/pt2.txt).
Models: Download weights (e.g., sv4d.safetensors, sv3d_u.safetensors) to a checkpoints/ directory.
Inference: Run sampling scripts like python scripts/sampling/simple_video_sample_4d.py --input_path <path/to/video>.
Demos: Streamlit demos are available in scripts/demo/.
Dependencies: PyTorch 2.0 is recommended; PyTorch 1.13 is required for some autoencoder training configurations.

Highlighted Details

SV4D: Generates 40 frames (5 context, 8 reference views) at 576x576 resolution, with a novel sampling method for longer videos.
SV3D: Image-to-video model for novel multi-view synthesis, with SV3D_u (orbital videos) and SV3D_p (specified camera paths) variants.
SDXL-Turbo: Lightning-fast text-to-image model.
SVD/SVD-XT: Image-to-video models generating 14 or 25 frames at 576x1024.
Invisible Watermarking: Generated images include an invisible watermark with a detection script provided.

Maintenance & Community

The project is actively developed by Stability AI. Links to project pages, tech reports, and video summaries are provided for specific models.

Licensing & Compatibility

Models are released under various licenses, including CreativeML Open RAIL++-M and research-specific licenses requiring application for access (e.g., SDXL-0.9). Compatibility for commercial use may be restricted by these licenses.

Limitations & Caveats

The README notes potential Python version conflicts and that some training configurations require specific dataset formats (webdataset) and manual edits. Autoencoder training is limited to PyTorch 1.13. Access to certain model weights requires application and approval.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

140 stars in the last 30 days