MaskGIT-pytorch  by dome272

Pytorch implementation of MaskGIT research paper

Created 3 years ago
450 stars

Top 66.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of MaskGIT, a masked generative image transformer. It aims to improve image generation quality by replacing the unidirectional transformer in VQGAN with a bidirectional one, inspired by BERT's masked language modeling approach. This is suitable for researchers and practitioners interested in state-of-the-art generative image models.

How It Works

MaskGIT enhances the second stage of VQGAN by employing a bidirectional transformer. This transformer is trained by masking out random tokens in the image representation and predicting them, similar to BERT. The masking percentage varies per batch, and inference involves iteratively sampling confident predictions from a fully masked image. This bidirectional approach allows for more context-aware generation compared to unidirectional models.

Quick Start & Requirements

  • Install via pip install -r requirements.txt.
  • Requires PyTorch, NumPy, Pillow, and other standard Python libraries.
  • Training scripts training_vqgan.py and training_transformer.py are provided.
  • Dataset paths need to be manually edited in the training scripts.

Highlighted Details

  • Implements the core MaskGIT architecture, focusing on the bidirectional transformer stage.
  • Training scripts for both VQGAN and the transformer are available.
  • Inference algorithm for iterative generation from a masked state is included.
  • Includes transformer.py for direct access to the core model components.

Maintenance & Community

This project is marked as "work in progress." The official implementation is available at google-research/maskgit. There are no explicit links to community channels or roadmaps provided in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Given it's a PyTorch implementation of a research paper, users should assume it's for research purposes and verify licensing for commercial use.

Limitations & Caveats

The implementation is a work in progress, and the README notes that the training data used is significantly smaller than the original paper's. Hyperparameters require tuning, and image editing functionalities like inpainting are still under development.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Zhiqiang Xie Zhiqiang Xie(Coauthor of SGLang), and
1 more.

Sana by NVlabs

0.4%
4k
Image synthesis research paper using a linear diffusion transformer
Created 11 months ago
Updated 5 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
15 more.

taming-transformers by CompVis

0.1%
6k
Image synthesis research paper using transformers
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.