Framework for high-resolution editable talking face generation
Top 52.0% on sourcepulse
StyleHEAT is a framework for generating high-resolution, editable talking face videos using a pre-trained StyleGAN. It targets researchers and developers in computer vision and graphics who need to create realistic and controllable facial animations from images or videos, enabling applications like virtual avatars and content creation.
How It Works
StyleHEAT leverages the latent feature space of a pre-trained StyleGAN, discovering spatial transformation properties. It proposes a unified framework that allows for high-resolution video generation, disentangled control via driving video or audio, and flexible face editing. The approach utilizes StyleGAN inversion techniques (E4E, HFGI) to map input images/videos into the StyleGAN latent space, enabling manipulation and generation.
Quick Start & Requirements
conda create -n StyleHEAT python=3.7
), activate it (conda activate StyleHEAT
), and install dependencies (pip install -r requirements
). PyTorch 1.7.1 with CUDA 11.0 is required.bash/download.sh
or manually place them in the ./checkpoints
directory. This includes StyleGAN models, inversion encoders, and 3DMM libraries.python inference.py
with specified configurations for same-identity reenactment, cross-identity reenactment, intuitive editing, attribute editing, and audio reenactment.third_part/SadTalker
and additional libraries like pydub
, yacs
, librosa
, numba
, resampy
, imageio-ffmpeg
.Highlighted Details
optimize
for better results (slower) or encode
using HFGI.Maintenance & Community
The project is associated with ECCV 2022. It acknowledges contributions from StyleGAN-2, PIRenderer, HFGI, BaberShop, GFP-GAN, Pixel2Style2Pixel, and SadTalker.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README. However, it acknowledges and builds upon other projects, some of which may have specific licensing terms. Users should verify compatibility for commercial use.
Limitations & Caveats
Training requires significant data preprocessing for datasets like VoxCelebA and HDTF, with specific steps outlined for VideoWarper and the full framework training. The README mentions a TODO.sh script for 3DMM parameter extraction, indicating potential incompleteness in provided tooling.
2 years ago
1 day