ReVersion by ziqihuangg

Research paper for diffusion-based relation inversion from images

Created 2 years ago

506 stars

Top 61.7% on SourcePulse

Project Summary

This project introduces ReVersion, a method for extracting and applying relational concepts from images using diffusion models. It enables users to "invert" a specific relationship (e.g., "painted on") from a few example images and then apply this learned relation to new entities, generating novel scenes. The target audience includes researchers and practitioners in generative AI and computer vision interested in controllable image synthesis and concept manipulation.

How It Works

ReVersion leverages diffusion models to learn a "relation prompt" that encapsulates the interaction or spatial arrangement present in exemplar images. This prompt is optimized to capture the essence of the relation, allowing it to be injected into new text-to-image generation processes. The key advantage is the ability to disentangle and reuse relational concepts independently of specific entities, enabling flexible and creative scene generation.

Quick Start & Requirements

Install: Clone the repository and set up a Conda environment with PyTorch 1.11.0, torchvision 0.12.0, and cudatoolkit 11.3. Install dependencies via pip install diffusers["torch"] and requirements.txt.
Prerequisites: Python 3.8, PyTorch 1.11.0, CUDA 11.3.
Demo: An online Gradio demo is available, or launch a local one with python app_gradio.py.
Pre-trained Models: Available for download, or use the provided benchmark.

Highlighted Details

SIGGRAPH Asia 2024 presentation.
Optimized code allows saving/loading only the relation prompt, not the entire model.
Includes the ReVersion Benchmark with 10 relations and diverse entities.
Supports generating images from single prompts or lists of prompts via templates.

Maintenance & Community

The codebase is maintained by Ziqi Huang and Tianxing Wu. Built upon Stable Diffusion 1.5 and Hugging Face Diffusers.

Licensing & Compatibility

The repository does not explicitly state a license. The underlying models (Stable Diffusion) have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires specific older versions of PyTorch (1.11.0) and CUDA (11.3), which may pose compatibility challenges with newer hardware or software stacks. The license status for the project itself is unclear.

Health Check

Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days