image-gpt  by teddykoker

PyTorch implementation of Image GPT research paper

created 5 years ago
258 stars

Top 98.6% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a PyTorch implementation of OpenAI's Image GPT (iGPT), a generative model that treats images as sequences of pixels. It aims to reproduce the results from the "Generative Pretraining from Pixels" paper, enabling users to train and sample from image generation models.

How It Works

The core approach quantizes images into discrete tokens using k-means clustering, then applies a GPT-like transformer architecture to model the pixel sequence autoregressively. This allows for generative pre-training and subsequent fine-tuning for tasks like classification. The advantage lies in leveraging the proven success of transformer architectures for sequential data in the image domain.

Quick Start & Requirements

  • Install via pip.
  • Requires PyTorch.
  • GPU (NVIDIA 2070 mentioned for Fashion-MNIST training) and CUDA are recommended for reasonable training times.
  • Download pre-trained models using ./download.sh.
  • Official quick-start and usage examples are provided within the README.

Highlighted Details

  • Reproduces iGPT-S architecture and training methodology.
  • Supports generative pre-training and classification fine-tuning.
  • Includes scripts for computing centroids, training, sampling, and generating GIFs.
  • Demonstrates training a small model (26K parameters) on Fashion-MNIST in under 2 hours on a single NVIDIA 2070.

Maintenance & Community

The project is marked as "WIP" (Work In Progress). No specific community channels or notable contributors are mentioned in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is a work in progress with several planned features yet to be implemented, including batched k-means on GPU, BERT-style pre-training, and loading OpenAI's official pre-trained models. Reproducing iGPT-S results is a stated goal but may require significant compute resources.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Ross Taylor Ross Taylor(Cofounder of General Reasoning; Creator of Papers with Code), and
3 more.

pixel-cnn by openai

0.1%
2k
TensorFlow implementation for PixelCNN++ research paper
created 9 years ago
updated 5 years ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
3 more.

guided-diffusion by openai

0.2%
7k
Image synthesis codebase for diffusion models
created 4 years ago
updated 1 year ago
Feedback? Help us improve.