generative-models  by Stability-AI

Generative models SDK for video, image, and 3D synthesis research

created 2 years ago
26,244 stars

Top 1.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides Stability AI's generative models, including Stable Diffusion XL (SDXL) for text-to-image and Stable Video Diffusion (SVD) for image-to-video synthesis. It also features newer models like SDXL-Turbo and SV3D for faster generation and multi-view synthesis, targeting researchers and developers in AI art and video generation.

How It Works

The core architecture is config-driven, allowing modular assembly of submodules via instantiate_from_config(). It adopts a "denoiser framework" for both training and inference, unifying discrete and continuous time models. Conditioning inputs (text, class, spatial) are handled by a single GeneralConditioner class, separating guiders from samplers for increased flexibility.

Quick Start & Requirements

  • Install: pip install . (after cloning and setting up a Python 3.10 virtual environment with pip install -r requirements/pt2.txt).
  • Models: Download weights (e.g., sv4d.safetensors, sv3d_u.safetensors) to a checkpoints/ directory.
  • Inference: Run sampling scripts like python scripts/sampling/simple_video_sample_4d.py --input_path <path/to/video>.
  • Demos: Streamlit demos are available in scripts/demo/.
  • Dependencies: PyTorch 2.0 is recommended; PyTorch 1.13 is required for some autoencoder training configurations.

Highlighted Details

  • SV4D: Generates 40 frames (5 context, 8 reference views) at 576x576 resolution, with a novel sampling method for longer videos.
  • SV3D: Image-to-video model for novel multi-view synthesis, with SV3D_u (orbital videos) and SV3D_p (specified camera paths) variants.
  • SDXL-Turbo: Lightning-fast text-to-image model.
  • SVD/SVD-XT: Image-to-video models generating 14 or 25 frames at 576x1024.
  • Invisible Watermarking: Generated images include an invisible watermark with a detection script provided.

Maintenance & Community

The project is actively developed by Stability AI. Links to project pages, tech reports, and video summaries are provided for specific models.

Licensing & Compatibility

Models are released under various licenses, including CreativeML Open RAIL++-M and research-specific licenses requiring application for access (e.g., SDXL-0.9). Compatibility for commercial use may be restricted by these licenses.

Limitations & Caveats

The README notes potential Python version conflicts and that some training configurations require specific dataset formats (webdataset) and manual edits. Autoencoder training is limited to PyTorch 1.13. Access to certain model weights requires application and approval.

Health Check
Last commit

2 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
548 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
4 more.

taming-transformers by CompVis

0.1%
6k
Image synthesis research paper using transformers
created 4 years ago
updated 1 year ago
Feedback? Help us improve.