ReVersion  by ziqihuangg

Research paper for diffusion-based relation inversion from images

created 2 years ago
504 stars

Top 62.6% on sourcepulse

GitHubView on GitHub
Project Summary

This project introduces ReVersion, a method for extracting and applying relational concepts from images using diffusion models. It enables users to "invert" a specific relationship (e.g., "painted on") from a few example images and then apply this learned relation to new entities, generating novel scenes. The target audience includes researchers and practitioners in generative AI and computer vision interested in controllable image synthesis and concept manipulation.

How It Works

ReVersion leverages diffusion models to learn a "relation prompt" that encapsulates the interaction or spatial arrangement present in exemplar images. This prompt is optimized to capture the essence of the relation, allowing it to be injected into new text-to-image generation processes. The key advantage is the ability to disentangle and reuse relational concepts independently of specific entities, enabling flexible and creative scene generation.

Quick Start & Requirements

  • Install: Clone the repository and set up a Conda environment with PyTorch 1.11.0, torchvision 0.12.0, and cudatoolkit 11.3. Install dependencies via pip install diffusers["torch"] and requirements.txt.
  • Prerequisites: Python 3.8, PyTorch 1.11.0, CUDA 11.3.
  • Demo: An online Gradio demo is available, or launch a local one with python app_gradio.py.
  • Pre-trained Models: Available for download, or use the provided benchmark.

Highlighted Details

  • SIGGRAPH Asia 2024 presentation.
  • Optimized code allows saving/loading only the relation prompt, not the entire model.
  • Includes the ReVersion Benchmark with 10 relations and diverse entities.
  • Supports generating images from single prompts or lists of prompts via templates.

Maintenance & Community

The codebase is maintained by Ziqi Huang and Tianxing Wu. Built upon Stable Diffusion 1.5 and Hugging Face Diffusers.

Licensing & Compatibility

The repository does not explicitly state a license. The underlying models (Stable Diffusion) have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project requires specific older versions of PyTorch (1.11.0) and CUDA (11.3), which may pose compatibility challenges with newer hardware or software stacks. The license status for the project itself is unclear.

Health Check
Last commit

8 months ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Dan Abramov Dan Abramov(Core Contributor to React), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
28 more.

stable-diffusion by CompVis

0.1%
71k
Latent text-to-image diffusion model
created 3 years ago
updated 1 year ago
Feedback? Help us improve.