TediGAN by IIGROUP

PyTorch code for text-guided face image generation/manipulation

Created 5 years ago

391 stars

Top 73.5% on SourcePulse

View on GitHub

3 Experts Love This Project

Ben Firshman

Cofounder of Replicate

Chenlin Meng

Cofounder of Pika

Jiaming Song

Chief Scientist at Luma AI

Project Summary

TediGAN provides a unified PyTorch framework for text-guided face image generation and manipulation, targeting researchers and practitioners in computer vision and generative AI. It enables high-quality, diverse, and controllable synthesis of facial images based on textual descriptions, building upon StyleGAN and CLIP.

How It Works

TediGAN unifies text-guided image generation and manipulation by leveraging a pre-trained StyleGAN generator and a text encoder (like CLIP). The core approach involves GAN inversion to find latent codes corresponding to input images, followed by instance-level optimization using CLIP loss to guide manipulation or generation based on text prompts. This method allows for fine-grained control over image attributes and diverse output generation.

Quick Start & Requirements

Install: pip install ftfy regex tqdm and pip install git+https://github.com/openai/CLIP.git.
Prerequisites: PyTorch, Python, and potentially CUDA for GPU acceleration. Pretrained StyleGAN models are required.
Running: Use python invert.py for manipulation/generation or streamlit run streamlit_app.py for an online demo.
Links: TediGAN Preprint, Colab Demo, Pretrained Models

Highlighted Details

Supports high-resolution (1024) and multi-modal generation.
Offers both image manipulation and diverse image generation capabilities.
Integrates with CLIP for powerful text-image alignment.
Provides example scripts for training StyleGAN generators and performing GAN inversion.

Maintenance & Community

The project is associated with CVPR 2021 and has an extended version published on arXiv. Contact information for the primary author is provided. An online demo is available, implemented using Cog.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it heavily relies on and borrows code from other projects like genforce and CLIP, whose licenses should be consulted for compatibility, especially for commercial use.

Limitations & Caveats

The README mentions that training StyleGAN generators requires significant computational resources (e.g., 8 GPUs). While it supports high-resolution outputs, memory constraints (OOM errors) might occur, with a flag provided to mitigate this. The project's primary focus is on face image generation and manipulation.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days