Image synthesis research paper using transformers
Top 8.3% on sourcepulse
This repository provides a PyTorch implementation for "Taming Transformers for High-Resolution Image Synthesis," enabling efficient and expressive image generation by combining convolutional VQGANs with autoregressive transformers. It's targeted at researchers and practitioners in computer vision and generative modeling looking to achieve state-of-the-art results in high-resolution image synthesis.
How It Works
The core approach uses a VQGAN (Vector Quantized Generative Adversarial Network) to learn a codebook of visual parts, effectively compressing images into discrete tokens. An autoregressive transformer then models the composition of these tokens, allowing for high-resolution synthesis. This hybrid approach leverages the efficiency of convolutions for local feature extraction and the global context modeling power of transformers.
Quick Start & Requirements
conda env create -f environment.yaml
and conda activate taming
.Highlighted Details
Maintenance & Community
The project is associated with the CVPR 2021 paper. Updates in 2022 mention new pretrained VQGANs for Latent Diffusion Models and scene synthesis models.
Licensing & Compatibility
The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking would require clarification.
Limitations & Caveats
Data preparation for datasets like ImageNet can be time-consuming and requires significant disk space. Some models may require specific versions of dependencies (e.g., MiDaS v2.0 for depth map generation). The README mentions a bugfix for the quantizer, which is disabled by default for backward compatibility.
1 year ago
Inactive