video-diffusion-pytorch by lucidrains

PyTorch implementation of video diffusion models

Created 3 years ago

1,370 stars

Top 29.3% on SourcePulse

1 Expert Loves This Project

chenlin9

Cofounder of Pika

Project Summary

This repository provides a PyTorch implementation of Video Diffusion Models, extending diffusion models for video generation. It targets researchers and practitioners in generative AI, offering a way to synthesize videos from scratch or conditioned on text.

How It Works

The core of the implementation is a space-time factored U-Net architecture, which allows for efficient attention across both spatial and temporal dimensions. This design choice is crucial for handling the increased complexity of video data compared to static images, enabling better video quality and faster convergence.

Quick Start & Requirements

Install via pip: pip install video-diffusion-pytorch
Requires PyTorch.
For text conditioning, BERT-large is used by default, or BERT-base can be specified.
Training can be performed using a Trainer class on a folder of GIFs.
Official project page: https://github.com/lucidrains/video-diffusion-pytorch

Highlighted Details

Achieves 14k FID on moving MNIST, outperforming NUWA.
Supports text-to-video generation by conditioning on text embeddings (BERT-large) or directly on text strings.
Includes a Trainer class for simplified training on GIF datasets.
Explores co-training images and videos by focusing attention on the present moment.

Maintenance & Community

Developed by lucidrains, a prolific contributor to generative AI research implementations.
Mentions resources provided by Stability.ai.
Future text-to-video developments are centralized at Imagen-pytorch.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

The project is marked as "wip" (work in progress).
Some planned features, such as a 3D CLIP, are still in the "todo" list.
The README notes that torchvideo appears immature, suggesting potential challenges with video data handling libraries.
No explicit mention of hardware requirements (e.g., GPU, VRAM) for training or inference.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

13 stars in the last 30 days

Explore Similar Projects

LAMP by RQ-Wu

LAMP: Few-shot video generation research paper (CVPR 2024)

Created 2 years ago

Updated 1 year ago

CM3Leon by kyegomez

Open-source implementation of a multimodal AI research paper

Created 2 years ago

Updated 2 years ago

RAVE by RehgLab

Video editing framework (research paper) using diffusion models

Created 2 years ago

Updated 11 months ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

Gen-L-Video by G-U-N

Video generation research paper using temporal co-denoising

Created 2 years ago

Updated 2 months ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI).

Magic-Me by Zhen-Dong

Video diffusion for personalized clips

Created 1 year ago

Updated 1 year ago

kandinsky-5 by kandinskylab

Advanced diffusion models for versatile video and image generation

Created 5 months ago

Updated 1 week ago

moment_detr by jayleicn

Video moment retrieval via natural language queries (NeurIPS 2021 paper)

Created 4 years ago

Updated 1 year ago

Allegro by rhymes-ai

Text-to-video model for generating short, high-quality videos

Created 1 year ago

Updated 11 months ago

FollowYourPose by mayuelala

Research paper for pose-guided text-to-video generation

Created 2 years ago

Updated 1 year ago

Starred by

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

FateZero by ChenyangQiQi

Zero-shot video editor (ICCV 2023 Oral) using attention fusion

Created 2 years ago

Updated 2 years ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

10 more.

open_flamingo by mlfoundations

Open-source framework for training large multimodal models

Created 3 years ago

Updated 1 year ago

Starred by

Matei Zaharia

Matei Zaharia(Cofounder of Databricks),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

9 more.

LWM by LargeWorldModel

Multimodal autoregressive model for long-context video/text

Created 1 year ago

Updated 1 year ago

Feedback? Help us improve.