Domain adaptation for image generators via text prompts
Top 33.6% on sourcepulse
StyleGAN-NADA enables text-guided adaptation of pre-trained image generators to new domains without requiring any images from the target domain. This is achieved by leveraging CLIP's semantic understanding to align generated images with textual descriptions, allowing for novel style and shape transformations. The primary audience is researchers and practitioners working with generative models who need to adapt them to specific styles or concepts efficiently.
How It Works
The method employs two generators, one held constant and the other trained. Training involves minimizing the CLIP-based distance between the generated images of both generators, while simultaneously enforcing that the direction between their CLIP embeddings aligns with a specified textual direction. This approach allows for "blind" domain adaptation, relying solely on semantic guidance from CLIP.
Quick Start & Requirements
conda
and pip
.python train.py
with various arguments for customization.Highlighted Details
--style_img_dir
.Maintenance & Community
The project is associated with SIGGRAPH 2022. Updates include HuggingFace Spaces demo, StyleGAN-XL support, and Replicate.ai integration. Links to a Google Drive with pre-trained models are provided.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. However, it relies on CLIP (MIT License) and StyleGAN2 (MIT License). Compatibility for commercial use or closed-source linking would require clarification of the specific license terms for StyleGAN-NADA itself.
Limitations & Caveats
StyleGAN3/XL fine-tuning may exhibit grid artifacts, and these models do not currently support layer freezing. Smaller GPUs might encounter memory limitations with the Docker UI. The README mentions that some edits might work better with specific ReStyle encoders (pSp vs. e4e).
2 years ago
Inactive