image-gpt  by openai

Image generation research paper, code, and models

Created 5 years ago
2,067 stars

Top 21.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code and pre-trained models for Image GPT (iGPT), a generative model for images based on the GPT-2 architecture. It enables researchers and engineers to experiment with pixel-level generative pre-training for image synthesis and analysis.

How It Works

iGPT adapts the GPT-2 transformer architecture for image generation by treating pixels as a sequence. It uses a novel 9-bit color palette quantization and a start-of-sequence token to enable autoregressive generation. This approach allows for flexible image generation and evaluation, leveraging the proven transformer framework for visual data.

Quick Start & Requirements

  • Install: Use conda to create an environment and install dependencies:
    conda create --name image-gpt python=3.7.3
    conda activate image-gpt
    conda install numpy=1.16.3 tensorflow-gpu=1.13.1 imageio=2.8.0 requests=2.21.0 tqdm=4.46.0
    
  • Prerequisites: Ubuntu 16.04, Python 3.7.3, TensorFlow GPU 1.13.1, NVIDIA GPU with CUDA.
  • Models/Data: Download checkpoints, ImageNet, CIFAR-10, and color clusters using download.py.
  • Docs: Usage examples for sampling and evaluation are provided in the README.

Highlighted Details

  • Implements generative pre-training from pixels using a GPT-2 codebase fork.
  • Supports sampling and evaluation of iGPT models (S, M, L) with provided checkpoints.
  • Achieves generative losses matching paper figures (e.g., 2.0895 for iGPT-S on ImageNet).
  • Includes utilities for color quantization and dequantization for the 9-bit palette.

Maintenance & Community

  • Status: Archived (code provided as-is, no updates expected).
  • Primary contributor: OpenAI.
  • Citation: Chen et al., "Generative Pretraining from Pixels", 2020.

Licensing & Compatibility

  • License: Modified MIT.
  • Compatibility: Generally permissive for commercial use, but the "Modified MIT" license should be reviewed for specific terms.

Limitations & Caveats

The project is archived, indicating no further development or support. It requires specific, older versions of TensorFlow (1.13.1) and Python (3.7.3), which may pose compatibility challenges with modern systems and libraries. The provided datasets are center-cropped, not randomly cropped, which may affect replication of training results.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.