VideoGPT by wilson1yan

Video generation research paper using VQ-VAE and Transformers

Created 4 years ago

1,065 stars

Top 35.5% on SourcePulse

View on GitHub

2 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Chenlin Meng

Cofounder of Pika

Project Summary

VideoGPT offers a straightforward architecture for generative video modeling using VQ-VAE and Transformers. It's designed for researchers and practitioners looking for a reproducible, minimalistic approach to video generation, competitive with GANs on benchmark datasets like BAIR Robot.

How It Works

VideoGPT employs a two-stage process. First, a VQ-VAE with 3D convolutions and axial self-attention discretizes raw video into a sequence of latent codes. Second, a GPT-like Transformer autoregressively models these discrete latents, incorporating spatio-temporal position encodings. This approach simplifies training and allows for competitive generation quality with a clean, modular design.

Quick Start & Requirements

Install: pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html followed by pip install git+https://github.com/wilson1yan/VideoGPT.git.
Prerequisites: CUDA 11.0 (cudatoolkit=11.0), Python 3.7+, PyTorch 1.7.1. Optional sparse attention requires llvm-9-dev and deepspeed.
Data: HDF5 format or directory structure with MP4 videos. Scripts are provided for BAIR and UCF-101 preprocessing.
Demo: Huggingface Spaces

Highlighted Details

Generates samples competitive with state-of-the-art GANs on BAIR Robot dataset.
Achieves high-fidelity natural image generation from UCF-101 and TGIF.
Supports optional sparse attention for compute-limited scenarios.
Provides scripts for VQ-VAE and VideoGPT training, sampling, and FVD/Inception Score evaluation.

Maintenance & Community

The project is associated with authors from UC Berkeley and Google. Further details on community or roadmap are not explicitly stated in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes.

Limitations & Caveats

The README notes that reproducing full paper results requires a separate, less clean codebase. The provided PyTorch version (1.7.1) is older, potentially requiring environment management for compatibility with newer libraries.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days