stylegan3-editing by yuval-alaluf

Image/video editing research paper using StyleGAN3

Created 4 years ago

687 stars

Top 49.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Chuan Li

Chief Scientific Officer at Lambda

Project Summary

This repository provides the official implementation for "Third Time's the Charm? Image and Video Editing with StyleGAN3," focusing on analyzing and leveraging the StyleGAN3 architecture for image and video manipulation. It's designed for researchers and practitioners in generative AI and computer vision who want to explore advanced editing capabilities beyond StyleGAN2.

How It Works

The project analyzes StyleGAN3's latent spaces, finding W/W+ spaces more entangled than StyleGAN2's, recommending StyleSpace for fine-grained editing. It introduces a novel encoder trained on aligned data that can invert unaligned images, and a video editing workflow that reduces texture sticking and expands the field of view using a fine-tuned StyleGAN3 generator.

Quick Start & Requirements

Installation: Recommended via Anaconda using environment/sg3_env.yaml.
Prerequisites: Linux or macOS, NVIDIA GPU with CUDA, CuDNN, Python 3.
Setup: Requires downloading pre-trained StyleGAN3 generators and auxiliary models (e.g., IR-SE50, MTCNN) into a pretrained_models directory.
Resources: Training encoders involves significant GPU time. Inference is more accessible.
Documentation: Inference Notebook available for editing real images.

Highlighted Details

Offers a novel video inversion and editing workflow with smoothing and FOV expansion.
Supports editing via InterFaceGAN and StyleCLIP global directions.
Includes Pivotal Tuning Inversion (PTI) with improved initialization using the project's encoder.
Provides pre-trained encoders (ReStyle-pSp, ReStyle-e4e) for StyleGAN3.

Maintenance & Community

The project is associated with its authors from Tel Aviv University and Adobe. Links to relevant papers and codebases are provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it heavily relies on and acknowledges the official StyleGAN3 repository, which is typically under a permissive license like MIT. Compatibility for commercial use would depend on the specific license of the underlying StyleGAN3 implementation used.

Limitations & Caveats

The README notes that W/W+ latent spaces in StyleGAN3 are more entangled than StyleGAN2's. While CPU support might be possible with modifications, it's not inherently supported. Training custom InterFaceGAN boundaries requires generating large datasets of latent codes and attribute scores.

Health Check

Last Commit

3 years ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days