image-gpt by openai

Image generation research paper, code, and models

Created 5 years ago

2,078 stars

Top 21.3% on SourcePulse

View on GitHub

7 Experts Love This Project

Aravind Srinivas

Cofounder of Perplexity

Steve Sewell

Founder of Builder.io

Chenlin Meng

Cofounder of Pika

Lewis Tunstall

Research Engineer at Hugging Face

and 3 more!

Project Summary

This repository provides code and pre-trained models for Image GPT (iGPT), a generative model for images based on the GPT-2 architecture. It enables researchers and engineers to experiment with pixel-level generative pre-training for image synthesis and analysis.

How It Works

iGPT adapts the GPT-2 transformer architecture for image generation by treating pixels as a sequence. It uses a novel 9-bit color palette quantization and a start-of-sequence token to enable autoregressive generation. This approach allows for flexible image generation and evaluation, leveraging the proven transformer framework for visual data.

Quick Start & Requirements

Install: Use conda to create an environment and install dependencies:

conda create --name image-gpt python=3.7.3
conda activate image-gpt
conda install numpy=1.16.3 tensorflow-gpu=1.13.1 imageio=2.8.0 requests=2.21.0 tqdm=4.46.0

Prerequisites: Ubuntu 16.04, Python 3.7.3, TensorFlow GPU 1.13.1, NVIDIA GPU with CUDA.
Models/Data: Download checkpoints, ImageNet, CIFAR-10, and color clusters using download.py.
Docs: Usage examples for sampling and evaluation are provided in the README.

Highlighted Details

Implements generative pre-training from pixels using a GPT-2 codebase fork.
Supports sampling and evaluation of iGPT models (S, M, L) with provided checkpoints.
Achieves generative losses matching paper figures (e.g., 2.0895 for iGPT-S on ImageNet).
Includes utilities for color quantization and dequantization for the 9-bit palette.

Maintenance & Community

Status: Archived (code provided as-is, no updates expected).
Primary contributor: OpenAI.
Citation: Chen et al., "Generative Pretraining from Pixels", 2020.

Licensing & Compatibility

License: Modified MIT.
Compatibility: Generally permissive for commercial use, but the "Modified MIT" license should be reviewed for specific terms.

Limitations & Caveats

The project is archived, indicating no further development or support. It requires specific, older versions of TensorFlow (1.13.1) and Python (3.7.3), which may pose compatibility challenges with modern systems and libraries. The provided datasets are center-cropped, not randomly cropped, which may affect replication of training results.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days