STIT by rotemtzaban

GAN-based research paper for facial video editing

Created 4 years ago

1,202 stars

Top 32.5% on SourcePulse

View on GitHub

1 Expert Loves This Project

Luis Capelo

Cofounder of Lightning AI

Project Summary

STIT (Stitch it in Time) addresses the challenge of semantic facial editing in real videos using Generative Adversarial Networks (GANs). It targets researchers and practitioners in computer vision and graphics who need to perform high-quality, temporally coherent facial manipulations on videos, offering significant improvements over existing methods for talking-head videos.

How It Works

STIT leverages the inherent temporal consistency of source videos and the strong prior learned by StyleGAN's latent space. By carefully managing the editing pipeline and avoiding careless treatment of video components, it minimizes deviations from the natural temporal flow. The framework utilizes StyleGAN2-ADA and incorporates a "stitching tuning" process to enhance temporal coherence, effectively "stitching" edits across frames.

Quick Start & Requirements

Installation: Clone the repository and install dependencies via pip install -r requirements.txt. For StyleCLIP edits, run pip install git+https://github.com/openai/CLIP.git.
Prerequisites: PyTorch (1.8+) with torchvision, CUDA toolkit 11.0+, and StyleGAN2-ADA-PyTorch requirements. Pretrained models must be downloaded and placed in the project directory or configured via configs/path_config.py. Videos need to be split into individual frames (e.g., using ffmpeg).
Resources: Requires downloading large pretrained models.
Links: Project Page, StyleGAN2-ADA-PyTorch.

Highlighted Details

Achieves significant improvements in temporal consistency for facial editing in videos.
Demonstrates effectiveness on challenging, high-quality talking-head videos.
Offers a "stitching tuning" mechanism to maintain frame-to-frame coherence.
Supports editing out-of-domain videos (animations) with specific parameter adjustments.

Maintenance & Community

The project is associated with the authors' research and academic work. Links to relevant research papers and underlying project licenses are provided.

Licensing & Compatibility

The project incorporates components with various licenses: NVIDIA Source Code License (StyleGAN2-ADA), MIT (PTI, e4e, StyleCLIP, face-parsing.PyTorch), BSD 2-Clause (LPIPS), and Creative Commons NonCommercial (stylegan2-distillation). The non-commercial clause from stylegan2-distillation may restrict commercial use.

Limitations & Caveats

The project relies on specific versions of PyTorch and CUDA. Some components have non-commercial licenses, potentially limiting broader adoption. The effectiveness on out-of-domain videos requires specific parameter tuning.

Health Check

Last Commit

3 years ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days