ladi-vton by miccunifi

Virtual try-on research paper using latent diffusion, textual inversion

Created 2 years ago

462 stars

Top 65.7% on SourcePulse

Project Summary

LaDI-VTON addresses the challenge of virtual try-on by leveraging latent diffusion models enhanced with textual inversion. It targets researchers and developers in e-commerce and metaverse applications, offering a novel approach to generate realistic images of models wearing specified garments.

How It Works

The core innovation is a latent diffusion model augmented with a custom autoencoder module featuring learnable skip connections. This design aims to preserve the wearer's characteristics during generation. To accurately represent garment textures and details, a textual inversion component maps garment features to CLIP token embeddings, creating pseudo-word tokens that condition the diffusion process.

Quick Start & Requirements

Installation: Clone the repository and create a conda environment using environment.yml.
Dependencies: Python 3.10, PyTorch 2.0.1, torchvision 0.15.2, CUDA (implied by PyTorch version), xformers, wandb.
Data: Requires DressCode or VITON-HD datasets. Pre-extracted masks for DressCode are available.
Inference: Run python src/inference.py with specified dataset and root paths.
Documentation: Official Repository

Highlighted Details

Achieves state-of-the-art results on Dress Code and VITON-HD datasets.
Integrates latent diffusion with textual inversion for enhanced garment detail.
Offers a modular training pipeline for warping, EMASC, inversion adapter, and VTO modules.

Maintenance & Community

The project is associated with ACM Multimedia 2023 and lists several academic contributors. Training code was released in September 2023.

Licensing & Compatibility

License: Creative Commons BY-NC 4.0 (Attribution-NonCommercial).
Restrictions: Strictly for non-commercial use. Redistribution and adaptation require appropriate credit and indication of changes.

Limitations & Caveats

The non-commercial license restricts use in commercial products. Training involves multiple stages and requires significant dataset preparation and computational resources.

ladi-vton by miccunifi

Explore Similar Projects

LaCLIP by LijieFan

DIVA by baaivision

ml-papers by rosinality

e4t-diffusion by mkshing

METER by zdou0830

clip-pytorch by bubbliiiing

Compositional-Visual-Generation-with-Composable-Diffusion-Models-PyTorch by energy-based-model

Magic-TryOn by vivoCameraResearch

fashion-clip by patrickjohncyh

multimodal-garment-designer by aimagelab

Show-o by showlab

stable-diffusion by pesser