image-gpt by teddykoker

PyTorch implementation of Image GPT research paper

Created 5 years ago

260 stars

Top 97.7% on SourcePulse

View on GitHub

3 Experts Love This Project

Chenlin Meng

Cofounder of Pika

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Luis Capelo

Cofounder of Lightning AI

Project Summary

This repository provides a PyTorch implementation of OpenAI's Image GPT (iGPT), a generative model that treats images as sequences of pixels. It aims to reproduce the results from the "Generative Pretraining from Pixels" paper, enabling users to train and sample from image generation models.

How It Works

The core approach quantizes images into discrete tokens using k-means clustering, then applies a GPT-like transformer architecture to model the pixel sequence autoregressively. This allows for generative pre-training and subsequent fine-tuning for tasks like classification. The advantage lies in leveraging the proven success of transformer architectures for sequential data in the image domain.

Quick Start & Requirements

Install via pip.
Requires PyTorch.
GPU (NVIDIA 2070 mentioned for Fashion-MNIST training) and CUDA are recommended for reasonable training times.
Download pre-trained models using ./download.sh.
Official quick-start and usage examples are provided within the README.

Highlighted Details

Reproduces iGPT-S architecture and training methodology.
Supports generative pre-training and classification fine-tuning.
Includes scripts for computing centroids, training, sampling, and generating GIFs.
Demonstrates training a small model (26K parameters) on Fashion-MNIST in under 2 hours on a single NVIDIA 2070.

Maintenance & Community

The project is marked as "WIP" (Work In Progress). No specific community channels or notable contributors are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is a work in progress with several planned features yet to be implemented, including batched k-means on GPU, BERT-style pre-training, and loading OpenAI's official pre-trained models. Reproducing iGPT-S results is a stated goal but may require significant compute resources.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days