PyTorch code for long video generation research paper
Top 93.3% on sourcepulse
TATS is a PyTorch framework for generating long videos, addressing the challenge of creating thousands of frames from models trained on shorter sequences. It is designed for researchers and practitioners in generative AI and video synthesis.
How It Works
TATS employs a two-stage approach: a Time-Agnostic VQGAN for efficient video tokenization and a Time-Sensitive Transformer for autoregressive generation. This separation allows the VQGAN to learn robust visual representations independent of temporal dynamics, while the transformer focuses on capturing temporal coherence, enabling the generation of significantly longer videos than typically possible with direct autoregressive models.
Quick Start & Requirements
conda create -n tats python=3.8
, conda activate tats
, conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch
, pip install pytorch-lightning==1.5.4 einops ftfy h5py imageio imageio-ffmpeg regex scikit-video tqdm
Highlighted Details
Maintenance & Community
The project is associated with ECCV 2022 and cites contributions from VQGAN and VideoGPT. No specific community channels (Discord/Slack) or active maintenance signals are present in the README.
Licensing & Compatibility
Limitations & Caveats
The setup requires specific older versions of PyTorch (CUDA 10.2) and PyTorch Lightning (1.5.4), which may pose compatibility challenges with newer environments. Training examples indicate a substantial hardware requirement (multiple GPUs).
1 year ago
1 day