ICT  by raywzy

Image completion research paper using transformers

created 4 years ago
334 stars

Top 82.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation of the Image Completion Transformer (ICT) for high-fidelity pluralistic image completion. It is designed for researchers and practitioners in computer vision and generative modeling who need to fill missing regions in images with realistic and diverse content. The ICT leverages transformers for their superior ability to understand shape and geometry compared to traditional CNN-based methods.

How It Works

The ICT employs a two-stage approach. First, a Transformer model generates a coarse, semantically plausible completion in a latent space. This is followed by a guided upsampling network that refines the completion to high resolution, ensuring fidelity and detail. The transformer's attention mechanism is key to capturing long-range dependencies and contextual information, crucial for accurate image completion.

Quick Start & Requirements

  • Install: pip install -r requirements.txt
  • Prerequisites: Python >=3.6, PyTorch >=1.6, NVIDIA GPU + CUDA, cuDNN.
  • Pre-trained models must be downloaded separately.
  • Input images and masks should be .png format and masks binarized.
  • Model is trained for 256x256 input resolution.
  • See Project Page for more details.

Highlighted Details

  • Official PyTorch implementation of ICCV 2021 paper.
  • Transformer-based architecture for improved shape and geometry understanding.
  • Pluralistic completion allows for multiple diverse results.
  • Supports inference and training for both transformer and upsampling components.

Maintenance & Community

The repository is maintained by Ziyu Wan (@Raywzy). Contact is available via email: raywzy@gmail.com.

Licensing & Compatibility

The repository is for academic research use only. No specific license is mentioned, implying potential restrictions on commercial use.

Limitations & Caveats

The model is trained exclusively for 256x256 resolution. Masks require specific formatting (binarized, .png). Pre-trained models are large and require manual download. The "academic research use only" clause may restrict commercial applications.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
5 more.

taming-transformers by CompVis

0.1%
6k
Image synthesis research paper using transformers
created 4 years ago
updated 1 year ago
Feedback? Help us improve.