MaskGIT-pytorch by dome272

Pytorch implementation of MaskGIT research paper

Created 3 years ago

467 stars

Top 65.1% on SourcePulse

Project Summary

This repository provides a PyTorch implementation of MaskGIT, a masked generative image transformer. It aims to improve image generation quality by replacing the unidirectional transformer in VQGAN with a bidirectional one, inspired by BERT's masked language modeling approach. This is suitable for researchers and practitioners interested in state-of-the-art generative image models.

How It Works

MaskGIT enhances the second stage of VQGAN by employing a bidirectional transformer. This transformer is trained by masking out random tokens in the image representation and predicting them, similar to BERT. The masking percentage varies per batch, and inference involves iteratively sampling confident predictions from a fully masked image. This bidirectional approach allows for more context-aware generation compared to unidirectional models.

Quick Start & Requirements

Install via pip install -r requirements.txt.
Requires PyTorch, NumPy, Pillow, and other standard Python libraries.
Training scripts training_vqgan.py and training_transformer.py are provided.
Dataset paths need to be manually edited in the training scripts.

Highlighted Details

Implements the core MaskGIT architecture, focusing on the bidirectional transformer stage.
Training scripts for both VQGAN and the transformer are available.
Inference algorithm for iterative generation from a masked state is included.
Includes transformer.py for direct access to the core model components.

Maintenance & Community

This project is marked as "work in progress." The official implementation is available at google-research/maskgit. There are no explicit links to community channels or roadmaps provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Given it's a PyTorch implementation of a research paper, users should assume it's for research purposes and verify licensing for commercial use.

Limitations & Caveats

The implementation is a work in progress, and the README notes that the training data used is significantly smaller than the original paper's. Hyperparameters require tuning, and image editing functionalities like inpainting are still under development.

MaskGIT-pytorch by dome272

Explore Similar Projects

art-msra by microsoft

ICT by raywzy

MDT by sail-sg

masquerade-nodes-comfyui by BadCafeCode

LlamaGen by FoundationVision

clipseg by timojl

Sana by NVlabs

image-gpt by openai

DALLE-pytorch by lucidrains

DALLE2-pytorch by lucidrains

taming-transformers by CompVis

DALL-E by openai