VideoGPT  by wilson1yan

Video generation research paper using VQ-VAE and Transformers

created 4 years ago
1,045 stars

Top 36.6% on sourcepulse

GitHubView on GitHub
Project Summary

VideoGPT offers a straightforward architecture for generative video modeling using VQ-VAE and Transformers. It's designed for researchers and practitioners looking for a reproducible, minimalistic approach to video generation, competitive with GANs on benchmark datasets like BAIR Robot.

How It Works

VideoGPT employs a two-stage process. First, a VQ-VAE with 3D convolutions and axial self-attention discretizes raw video into a sequence of latent codes. Second, a GPT-like Transformer autoregressively models these discrete latents, incorporating spatio-temporal position encodings. This approach simplifies training and allows for competitive generation quality with a clean, modular design.

Quick Start & Requirements

  • Install: pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html followed by pip install git+https://github.com/wilson1yan/VideoGPT.git.
  • Prerequisites: CUDA 11.0 (cudatoolkit=11.0), Python 3.7+, PyTorch 1.7.1. Optional sparse attention requires llvm-9-dev and deepspeed.
  • Data: HDF5 format or directory structure with MP4 videos. Scripts are provided for BAIR and UCF-101 preprocessing.
  • Demo: Huggingface Spaces

Highlighted Details

  • Generates samples competitive with state-of-the-art GANs on BAIR Robot dataset.
  • Achieves high-fidelity natural image generation from UCF-101 and TGIF.
  • Supports optional sparse attention for compute-limited scenarios.
  • Provides scripts for VQ-VAE and VideoGPT training, sampling, and FVD/Inception Score evaluation.

Maintenance & Community

The project is associated with authors from UC Berkeley and Google. Further details on community or roadmap are not explicitly stated in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes.

Limitations & Caveats

The README notes that reproducing full paper results requires a separate, less clean codebase. The provided PyTorch version (1.7.1) is older, potentially requiring environment management for compatibility with newer libraries.

Health Check
Last commit

10 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
4 more.

taming-transformers by CompVis

0.1%
6k
Image synthesis research paper using transformers
created 4 years ago
updated 1 year ago
Feedback? Help us improve.