instruct-pix2pix  by timothybrooks

Image editing model for instruction-based image manipulation

created 2 years ago
6,752 stars

Top 7.7% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of InstructPix2Pix, a model for instruction-based image editing. It allows users to edit images by providing natural language instructions, offering a powerful tool for creative image manipulation and content generation.

How It Works

InstructPix2Pix is fine-tuned from a Stable Diffusion checkpoint. It leverages a large, generated dataset of image-instruction-output triplets. The model's core innovation lies in its ability to interpret and apply textual editing instructions to modify input images, balancing adherence to the instruction with preservation of the original image's structure.

Quick Start & Requirements

  • Install dependencies and download checkpoints:
    conda env create -f environment.yaml
    conda activate ip2p
    bash scripts/download_checkpoints.sh
    
  • Edit an image:
    python edit_cli.py --input imgs/example.jpg --output imgs/output.jpg --edit "turn him into a cyborg"
    
  • Requires a GPU with >18GB VRAM for default settings.
  • Official Docs: https://github.com/timothybrooks/instruct-pix2pix

Highlighted Details

  • Trained on a dataset of 451,990 examples generated using GPT-3 and Stable Diffusion.
  • Offers fine-grained control over editing via cfg-text and cfg-image parameters.
  • Includes an interactive Gradio app for real-time editing.
  • Codebase is based on CompVis/stable_diffusion.

Maintenance & Community

  • Project initiated by Tim Brooks, Aleksander Holynski, and Alexei A. Efros from UC Berkeley.
  • Available on HuggingFace Spaces for browser-based demos and Replicate for API access.
  • Integrations available via imaginairy and Hugging Face diffusers.

Licensing & Compatibility

  • The specific license is not explicitly stated in the README, but the codebase is based on Stable Diffusion, which typically uses a permissive license. Compatibility for commercial use should be verified.

Limitations & Caveats

  • The default setup requires a high-end GPU (>18GB VRAM).
  • The quality of edits can be sensitive to parameter tuning (CFG scales, steps) and instruction phrasing.
  • Faces that are small in the input image may not be rendered well due to Stable Diffusion's autoencoder limitations.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
129 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.