PyTorch implementation of Image GPT research paper
Top 98.6% on sourcepulse
This repository provides a PyTorch implementation of OpenAI's Image GPT (iGPT), a generative model that treats images as sequences of pixels. It aims to reproduce the results from the "Generative Pretraining from Pixels" paper, enabling users to train and sample from image generation models.
How It Works
The core approach quantizes images into discrete tokens using k-means clustering, then applies a GPT-like transformer architecture to model the pixel sequence autoregressively. This allows for generative pre-training and subsequent fine-tuning for tasks like classification. The advantage lies in leveraging the proven success of transformer architectures for sequential data in the image domain.
Quick Start & Requirements
pip
../download.sh
.Highlighted Details
Maintenance & Community
The project is marked as "WIP" (Work In Progress). No specific community channels or notable contributors are mentioned in the README.
Licensing & Compatibility
The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project is a work in progress with several planned features yet to be implemented, including batched k-means on GPU, BERT-style pre-training, and loading OpenAI's official pre-trained models. Reproducing iGPT-S results is a stated goal but may require significant compute resources.
1 year ago
1 day