SD-Latent-Interposer by city96

Neural network for Stable Diffusion latent space interoperability

Created 2 years ago

312 stars

Top 86.5% on SourcePulse

1 Expert Loves This Project

comfyanonymous

Author of ComfyUI; Cofounder of Comfy Org

Project Summary

This project provides a neural network-based ComfyUI custom node that enables direct interoperability between latent spaces of different Stable Diffusion models, bypassing the need for VAE re-encoding. It targets users of Stable Diffusion who want to leverage latents from newer models (like SDXL, SD3, Flux.1, Stable Cascade) with older architectures (SDv1.x) or vice-versa, offering a more streamlined workflow and potentially preserving finer details.

How It Works

The interposer utilizes a small neural network, trained to map latents from one Stable Diffusion model's latent space to another. This approach avoids the lossy VAE decode/encode cycle, aiming to preserve image fidelity and composition. The training process involves minimizing multiple loss functions, including direct latent reconstruction (p_loss, b_loss) and round-trip consistency (r_loss, h_loss), to ensure accurate transformations between different latent representations.

Quick Start & Requirements

Install by cloning the repo into custom_nodes/SD-Latent-Interposer or placing comfy_latent_interposer.py in ComfyUI/custom_nodes/.
Requires huggingface-hub (pip install huggingface-hub).
Models are downloaded from Hugging Face by default; local models can be placed in custom_nodes/SD-Latent-Interposer/models.
Official documentation and examples are available within the repository.

Highlighted Details

Supports interoperability between SDv1.x, SDXL, SDv3, Flux.1, and Stable Cascade (Stage A/B).
Offers pre-trained models for various conversion directions (e.g., v1 to XL, XL to v1, v3 to v1, etc.).
Training code is provided, allowing for custom model training with a specified dataset format.
Version 4.0 models are available, with previous versions (v3.1, v1.1) also documented.

Maintenance & Community

The project is maintained by city96.
Model weights are hosted on Hugging Face.
No specific community channels (Discord/Slack) or roadmap are explicitly mentioned in the README.

Licensing & Compatibility

The repository is licensed under the Apache 2.0 license.
This license is permissive and generally compatible with commercial use and closed-source linking.

Limitations & Caveats

Some conversion directions (e.g., to Flux.1, to Stable Cascade) are marked as "No" in the compatibility table.
The README notes that color/hue/saturation shifts can still be an issue with certain conversions, and artifacts may persist.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

reconstruction-alignment by HorizonWind2004

Self-supervised learning for enhanced unified multimodal models

Created 4 months ago

Updated 2 days ago

FreeDoM by yujiwen

ICCV 2023 paper implementing training-free conditional diffusion

Created 2 years ago

Updated 2 years ago

MistoControlNet-Flux-dev by TheMistoAI

ControlNet for lineart/outline sketches, compatible with Flux1.dev

Created 1 year ago

Updated 4 months ago

LanPaint by scraed

ComfyUI node for training-free diffusion inpainting

Created 10 months ago

Updated 1 day ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI).

segmoe by segmind

Framework for dynamic Stable Diffusion Mixture of Experts, no training needed

Created 2 years ago

Updated 1 year ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Christian Laforte

Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), and

3 more.

taesd by madebyollin

Tiny AutoEncoder for Stable Diffusion latents

Created 2 years ago

Updated 2 weeks ago

LightningDiT by hustvl

Image generation research paper using latent diffusion

Created 1 year ago

Updated 3 weeks ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

Show-o by showlab

Unified transformer research paper for multimodal tasks

Created 1 year ago

Updated 3 days ago

text2image-gui by n00mkrad

GUI for Stable Diffusion text-to-image generation

Created 3 years ago

Updated 3 weeks ago

CatVTON by Zheng-Chong

Virtual try-on diffusion model research paper

Created 1 year ago

Updated 3 weeks ago

Starred by

Robin Rombach

Robin Rombach(Cofounder of Black Forest Labs),

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and

2 more.

Kandinsky-2 by ai-forever

Multilingual text-to-image latent diffusion model

Created 3 years ago

Updated 1 year ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs), and

29 more.

ControlNet by lllyasviel

Neural network structure for adding conditional control to diffusion models

Created 2 years ago

Updated 1 year ago

Feedback? Help us improve.