Virtual try-on research paper using latent diffusion, textual inversion
Top 68.4% on SourcePulse
LaDI-VTON addresses the challenge of virtual try-on by leveraging latent diffusion models enhanced with textual inversion. It targets researchers and developers in e-commerce and metaverse applications, offering a novel approach to generate realistic images of models wearing specified garments.
How It Works
The core innovation is a latent diffusion model augmented with a custom autoencoder module featuring learnable skip connections. This design aims to preserve the wearer's characteristics during generation. To accurately represent garment textures and details, a textual inversion component maps garment features to CLIP token embeddings, creating pseudo-word tokens that condition the diffusion process.
Quick Start & Requirements
environment.yml
.xformers
, wandb
.python src/inference.py
with specified dataset and root paths.Highlighted Details
Maintenance & Community
The project is associated with ACM Multimedia 2023 and lists several academic contributors. Training code was released in September 2023.
Licensing & Compatibility
Limitations & Caveats
The non-commercial license restricts use in commercial products. Training involves multiple stages and requires significant dataset preparation and computational resources.
1 year ago
1 week