StyleGAN-nada by rinongal

Domain adaptation for image generators via text prompts

Created 4 years ago

1,190 stars

Top 32.7% on SourcePulse

View on GitHub

2 Experts Love This Project

Ben Firshman

Cofounder of Replicate

Anastasis Germanidis

Cofounder of Runway

Project Summary

StyleGAN-NADA enables text-guided adaptation of pre-trained image generators to new domains without requiring any images from the target domain. This is achieved by leveraging CLIP's semantic understanding to align generated images with textual descriptions, allowing for novel style and shape transformations. The primary audience is researchers and practitioners working with generative models who need to adapt them to specific styles or concepts efficiently.

How It Works

The method employs two generators, one held constant and the other trained. Training involves minimizing the CLIP-based distance between the generated images of both generators, while simultaneously enforcing that the direction between their CLIP embeddings aligns with a specified textual direction. This approach allows for "blind" domain adaptation, relying solely on semantic guidance from CLIP.

Quick Start & Requirements

Install: Clone the repository and install dependencies via conda and pip.
Prerequisites: Anaconda, PyTorch 1.7.1, CUDA toolkit, CLIP (from OpenAI's GitHub), pre-trained StyleGAN2 generator.
Usage: A Colab notebook is provided for interactive use. Command-line training is available via python train.py with various arguments for customization.
Demo: HuggingFace Spaces demo available.
Docs: Project website and Colab notebook offer detailed guidance.

Highlighted Details

Adapts generators to new domains using only text prompts.
Supports adaptation from image styles via --style_img_dir.
Compatible with StyleGAN3 and StyleGAN-XL models with specific flags.
Enables cross-domain interpolation videos and out-of-domain editing of real images using inversion techniques like ReStyle.

Maintenance & Community

The project is associated with SIGGRAPH 2022. Updates include HuggingFace Spaces demo, StyleGAN-XL support, and Replicate.ai integration. Links to a Google Drive with pre-trained models are provided.

Licensing & Compatibility

The repository's license is not explicitly stated in the README. However, it relies on CLIP (MIT License) and StyleGAN2 (MIT License). Compatibility for commercial use or closed-source linking would require clarification of the specific license terms for StyleGAN-NADA itself.

Limitations & Caveats

StyleGAN3/XL fine-tuning may exhibit grid artifacts, and these models do not currently support layer freezing. Smaller GPUs might encounter memory limitations with the Docker UI. The README mentions that some edits might work better with specific ReStyle encoders (pSp vs. e4e).

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days