InstructDiffusion by cientgu

Vision task framework aligning CV with instructions

Created 2 years ago

441 stars

Top 67.7% on SourcePulse

Project Summary

InstructDiffusion provides a unifying framework for aligning computer vision tasks with human instructions, enabling users to edit images based on natural language prompts. It is built upon Instruct-pix2pix and Stable Diffusion, offering a generalist interface for various vision tasks.

How It Works

The framework leverages diffusion models, specifically adapting the Instruct-pix2pix architecture. It processes user instructions to guide the image generation or editing process, allowing for precise control over visual transformations through text prompts. This approach aims to unify diverse vision tasks under a single, instruction-following paradigm.

Quick Start & Requirements

Install: Clone the repository and set up the conda environment using environment.yaml.
Requirements: Requires one GPU with > 9GB memory for 512 resolution inference. Tested with Python 3.8 on Ubuntu 18.04. Training was performed on 48 NVIDIA V100 GPUs (32GB each).
Pre-trained Models: Download checkpoints from the checkpoints folder or via bash scripts/download_pretrained_instructdiffusion.sh.
Inference: Run python edit_cli.py with input image path and edit prompt.
Demo: Launch an interactive Gradio app with python edit_app.py.
Links: Project Page, Arxiv, Web Demo, QuickStart, Training, Acknowledge, Citation.

Highlighted Details

Implements a generalist modeling interface for vision tasks.
Supports image editing via natural language instructions.
Built on PyTorch, leveraging Instruct-pix2pix and Stable Diffusion.
Offers both command-line interface and interactive Gradio app for editing.

Maintenance & Community

The project acknowledges contributions from Stable Diffusion and Instruct-pix2pix. No specific community channels or active maintenance signals are detailed in the README.

Licensing & Compatibility

The README does not explicitly state the license. It is based on Instruct-pix2pix and Stable Diffusion, which have their own licenses. Compatibility for commercial use or closed-source linking would require checking the licenses of the underlying projects.

Limitations & Caveats

The code is primarily developed and tested on Ubuntu 18.04 with specific GPU configurations (48x V100 32GB for training). Compatibility with other platforms or hardware setups is not fully guaranteed.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days