ladi-vton  by miccunifi

Virtual try-on research paper using latent diffusion, textual inversion

created 2 years ago
447 stars

Top 68.4% on SourcePulse

GitHubView on GitHub
Project Summary

LaDI-VTON addresses the challenge of virtual try-on by leveraging latent diffusion models enhanced with textual inversion. It targets researchers and developers in e-commerce and metaverse applications, offering a novel approach to generate realistic images of models wearing specified garments.

How It Works

The core innovation is a latent diffusion model augmented with a custom autoencoder module featuring learnable skip connections. This design aims to preserve the wearer's characteristics during generation. To accurately represent garment textures and details, a textual inversion component maps garment features to CLIP token embeddings, creating pseudo-word tokens that condition the diffusion process.

Quick Start & Requirements

  • Installation: Clone the repository and create a conda environment using environment.yml.
  • Dependencies: Python 3.10, PyTorch 2.0.1, torchvision 0.15.2, CUDA (implied by PyTorch version), xformers, wandb.
  • Data: Requires DressCode or VITON-HD datasets. Pre-extracted masks for DressCode are available.
  • Inference: Run python src/inference.py with specified dataset and root paths.
  • Documentation: Official Repository

Highlighted Details

  • Achieves state-of-the-art results on Dress Code and VITON-HD datasets.
  • Integrates latent diffusion with textual inversion for enhanced garment detail.
  • Offers a modular training pipeline for warping, EMASC, inversion adapter, and VTO modules.

Maintenance & Community

The project is associated with ACM Multimedia 2023 and lists several academic contributors. Training code was released in September 2023.

Licensing & Compatibility

  • License: Creative Commons BY-NC 4.0 (Attribution-NonCommercial).
  • Restrictions: Strictly for non-commercial use. Redistribution and adaptation require appropriate credit and indication of changes.

Limitations & Caveats

The non-commercial license restricts use in commercial products. Training involves multiple stages and requires significant dataset preparation and computational resources.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
10 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.