VideoGPT  by wilson1yan

Video generation research paper using VQ-VAE and Transformers

Created 4 years ago
1,048 stars

Top 36.0% on SourcePulse

GitHubView on GitHub
Project Summary

VideoGPT offers a straightforward architecture for generative video modeling using VQ-VAE and Transformers. It's designed for researchers and practitioners looking for a reproducible, minimalistic approach to video generation, competitive with GANs on benchmark datasets like BAIR Robot.

How It Works

VideoGPT employs a two-stage process. First, a VQ-VAE with 3D convolutions and axial self-attention discretizes raw video into a sequence of latent codes. Second, a GPT-like Transformer autoregressively models these discrete latents, incorporating spatio-temporal position encodings. This approach simplifies training and allows for competitive generation quality with a clean, modular design.

Quick Start & Requirements

  • Install: pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 torchaudio==0.7.2 -f https://download.pytorch.org/whl/torch_stable.html followed by pip install git+https://github.com/wilson1yan/VideoGPT.git.
  • Prerequisites: CUDA 11.0 (cudatoolkit=11.0), Python 3.7+, PyTorch 1.7.1. Optional sparse attention requires llvm-9-dev and deepspeed.
  • Data: HDF5 format or directory structure with MP4 videos. Scripts are provided for BAIR and UCF-101 preprocessing.
  • Demo: Huggingface Spaces

Highlighted Details

  • Generates samples competitive with state-of-the-art GANs on BAIR Robot dataset.
  • Achieves high-fidelity natural image generation from UCF-101 and TGIF.
  • Supports optional sparse attention for compute-limited scenarios.
  • Provides scripts for VQ-VAE and VideoGPT training, sampling, and FVD/Inception Score evaluation.

Maintenance & Community

The project is associated with authors from UC Berkeley and Google. Further details on community or roadmap are not explicitly stated in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The code is provided for research purposes.

Limitations & Caveats

The README notes that reproducing full paper results requires a separate, less clean codebase. The provided PyTorch version (1.7.1) is older, potentially requiring environment management for compatibility with newer libraries.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI), Jiaming Song Jiaming Song(Chief Scientist at Luma AI), and
1 more.

SkyReels-V2 by SkyworkAI

3.3%
4k
Film generation model for infinite-length videos using diffusion forcing
Created 5 months ago
Updated 1 month ago
Feedback? Help us improve.