Discover and explore top open-source AI tools and projects—updated daily.
Pytorch implementation of MaskGIT research paper
Top 66.9% on SourcePulse
This repository provides a PyTorch implementation of MaskGIT, a masked generative image transformer. It aims to improve image generation quality by replacing the unidirectional transformer in VQGAN with a bidirectional one, inspired by BERT's masked language modeling approach. This is suitable for researchers and practitioners interested in state-of-the-art generative image models.
How It Works
MaskGIT enhances the second stage of VQGAN by employing a bidirectional transformer. This transformer is trained by masking out random tokens in the image representation and predicting them, similar to BERT. The masking percentage varies per batch, and inference involves iteratively sampling confident predictions from a fully masked image. This bidirectional approach allows for more context-aware generation compared to unidirectional models.
Quick Start & Requirements
pip install -r requirements.txt
.training_vqgan.py
and training_transformer.py
are provided.Highlighted Details
transformer.py
for direct access to the core model components.Maintenance & Community
This project is marked as "work in progress." The official implementation is available at google-research/maskgit
. There are no explicit links to community channels or roadmaps provided in the README.
Licensing & Compatibility
The repository does not explicitly state a license. Given it's a PyTorch implementation of a research paper, users should assume it's for research purposes and verify licensing for commercial use.
Limitations & Caveats
The implementation is a work in progress, and the README notes that the training data used is significantly smaller than the original paper's. Hyperparameters require tuning, and image editing functionalities like inpainting are still under development.
2 years ago
Inactive