video-diffusion-pytorch  by lucidrains

PyTorch implementation of video diffusion models

created 3 years ago
1,337 stars

Top 30.7% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a PyTorch implementation of Video Diffusion Models, extending diffusion models for video generation. It targets researchers and practitioners in generative AI, offering a way to synthesize videos from scratch or conditioned on text.

How It Works

The core of the implementation is a space-time factored U-Net architecture, which allows for efficient attention across both spatial and temporal dimensions. This design choice is crucial for handling the increased complexity of video data compared to static images, enabling better video quality and faster convergence.

Quick Start & Requirements

  • Install via pip: pip install video-diffusion-pytorch
  • Requires PyTorch.
  • For text conditioning, BERT-large is used by default, or BERT-base can be specified.
  • Training can be performed using a Trainer class on a folder of GIFs.
  • Official project page: https://github.com/lucidrains/video-diffusion-pytorch

Highlighted Details

  • Achieves 14k FID on moving MNIST, outperforming NUWA.
  • Supports text-to-video generation by conditioning on text embeddings (BERT-large) or directly on text strings.
  • Includes a Trainer class for simplified training on GIF datasets.
  • Explores co-training images and videos by focusing attention on the present moment.

Maintenance & Community

  • Developed by lucidrains, a prolific contributor to generative AI research implementations.
  • Mentions resources provided by Stability.ai.
  • Future text-to-video developments are centralized at Imagen-pytorch.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

  • The project is marked as "wip" (work in progress).
  • Some planned features, such as a 3D CLIP, are still in the "todo" list.
  • The README notes that torchvideo appears immature, suggesting potential challenges with video data handling libraries.
  • No explicit mention of hardware requirements (e.g., GPU, VRAM) for training or inference.
Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
35 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.