StyleHEAT by OpenTalker

Framework for high-resolution editable talking face generation

Created 3 years ago

659 stars

Top 51.0% on SourcePulse

Project Summary

StyleHEAT is a framework for generating high-resolution, editable talking face videos using a pre-trained StyleGAN. It targets researchers and developers in computer vision and graphics who need to create realistic and controllable facial animations from images or videos, enabling applications like virtual avatars and content creation.

How It Works

StyleHEAT leverages the latent feature space of a pre-trained StyleGAN, discovering spatial transformation properties. It proposes a unified framework that allows for high-resolution video generation, disentangled control via driving video or audio, and flexible face editing. The approach utilizes StyleGAN inversion techniques (E4E, HFGI) to map input images/videos into the StyleGAN latent space, enabling manipulation and generation.

Quick Start & Requirements

Install: Clone the repository, create a conda environment (conda create -n StyleHEAT python=3.7), activate it (conda activate StyleHEAT), and install dependencies (pip install -r requirements). PyTorch 1.7.1 with CUDA 11.0 is required.
Pretrained Models: Download checkpoints using bash/download.sh or manually place them in the ./checkpoints directory. This includes StyleGAN models, inversion encoders, and 3DMM libraries.
Inference: Run python inference.py with specified configurations for same-identity reenactment, cross-identity reenactment, intuitive editing, attribute editing, and audio reenactment.
Audio Reenactment: Requires installing SadTalker in third_part/SadTalker and additional libraries like pydub, yacs, librosa, numba, resampy, imageio-ffmpeg.
Links: Project Website

Highlighted Details

Supports same-identity and cross-identity reenactment.
Enables intuitive editing (expression, pose) and attribute editing (young, old, beard, lip).
Integrates SadTalker for audio-driven talking head generation.
Offers two inversion options: optimize for better results (slower) or encode using HFGI.

Maintenance & Community

The project is associated with ECCV 2022. It acknowledges contributions from StyleGAN-2, PIRenderer, HFGI, BaberShop, GFP-GAN, Pixel2Style2Pixel, and SadTalker.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. However, it acknowledges and builds upon other projects, some of which may have specific licensing terms. Users should verify compatibility for commercial use.

Limitations & Caveats

Training requires significant data preprocessing for datasets like VoxCelebA and HDTF, with specific steps outlined for VideoWarper and the full framework training. The README mentions a TODO.sh script for 3DMM parameter extraction, indicating potential incompleteness in provided tooling.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days