TATS by songweige

PyTorch code for long video generation research paper

Created 3 years ago

285 stars

Top 91.9% on SourcePulse

Project Summary

TATS is a PyTorch framework for generating long videos, addressing the challenge of creating thousands of frames from models trained on shorter sequences. It is designed for researchers and practitioners in generative AI and video synthesis.

How It Works

TATS employs a two-stage approach: a Time-Agnostic VQGAN for efficient video tokenization and a Time-Sensitive Transformer for autoregressive generation. This separation allows the VQGAN to learn robust visual representations independent of temporal dynamics, while the transformer focuses on capturing temporal coherence, enabling the generation of significantly longer videos than typically possible with direct autoregressive models.

Quick Start & Requirements

Install: conda create -n tats python=3.8, conda activate tats, conda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch, pip install pytorch-lightning==1.5.4 einops ftfy h5py imageio imageio-ffmpeg regex scikit-video tqdm
Prerequisites: PyTorch with CUDA 10.2, Python 3.8.
Resources: Training examples suggest multi-GPU setups (e.g., 8 GPUs) and significant training steps (up to 2M).
Docs: Project Website, Paper, Video.

Highlighted Details

Supports generation of long videos (thousands of frames) from short-trained models via sliding window.
Offers conditional generation for text and audio inputs.
Includes hierarchical sampling methods for enhanced control.
Provides scripts for both short and long video synthesis, FVD computation, and training VQGANs and Transformers.

Maintenance & Community

The project is associated with ECCV 2022 and cites contributions from VQGAN and VideoGPT. No specific community channels (Discord/Slack) or active maintenance signals are present in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive for commercial use and integration with closed-source projects.

Limitations & Caveats

The setup requires specific older versions of PyTorch (CUDA 10.2) and PyTorch Lightning (1.5.4), which may pose compatibility challenges with newer environments. Training examples indicate a substantial hardware requirement (multiple GPUs).

TATS by songweige

Explore Similar Projects

YUME by stdstu12

kandinsky-5 by kandinskylab

Tora by alibaba

Allegro by rhymes-ai

Lumina-T2X by Alpha-VLLM

Awesome-Video-Diffusion-Models by ChenHsing

Pyramid-Flow by jy0205

Step-Video-T2V by stepfun-ai

mochi by genmoai

HunyuanVideo by Tencent-Hunyuan

Wan2.2 by Wan-Video

generative-models by Stability-AI