PyTorch code for text-guided face image generation/manipulation
Top 74.9% on sourcepulse
TediGAN provides a unified PyTorch framework for text-guided face image generation and manipulation, targeting researchers and practitioners in computer vision and generative AI. It enables high-quality, diverse, and controllable synthesis of facial images based on textual descriptions, building upon StyleGAN and CLIP.
How It Works
TediGAN unifies text-guided image generation and manipulation by leveraging a pre-trained StyleGAN generator and a text encoder (like CLIP). The core approach involves GAN inversion to find latent codes corresponding to input images, followed by instance-level optimization using CLIP loss to guide manipulation or generation based on text prompts. This method allows for fine-grained control over image attributes and diverse output generation.
Quick Start & Requirements
pip install ftfy regex tqdm
and pip install git+https://github.com/openai/CLIP.git
.python invert.py
for manipulation/generation or streamlit run streamlit_app.py
for an online demo.Highlighted Details
Maintenance & Community
The project is associated with CVPR 2021 and has an extended version published on arXiv. Contact information for the primary author is provided. An online demo is available, implemented using Cog.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. However, it heavily relies on and borrows code from other projects like genforce
and CLIP
, whose licenses should be consulted for compatibility, especially for commercial use.
Limitations & Caveats
The README mentions that training StyleGAN generators requires significant computational resources (e.g., 8 GPUs). While it supports high-resolution outputs, memory constraints (OOM errors) might occur, with a flag provided to mitigate this. The project's primary focus is on face image generation and manipulation.
2 years ago
Inactive