Magic-TryOn by vivoCameraResearch

Video virtual try-on framework

Created 6 months ago

485 stars

Top 63.4% on SourcePulse

Project Summary

MagicTryOn is a video virtual try-on framework designed for researchers and developers in computer vision and graphics. It leverages a large-scale video diffusion Transformer to enable realistic garment-preserving virtual try-on experiences, offering a coarse-to-fine strategy for enhanced garment fidelity.

How It Works

The framework utilizes a Wan2.1 diffusion Transformer backbone with full self-attention to maintain spatiotemporal consistency across video frames. A key innovation is its coarse-to-fine garment preservation strategy, augmented by a mask-aware loss function, which specifically targets and enhances the fidelity of the garment region during the try-on process.

Quick Start & Requirements

Installation: Create a conda environment (conda create -n magictryon python==3.12.9) and activate it (conda activate magictryon). Install dependencies via pip install -r requirements.txt or conda env create -f environment.yaml.
Prerequisites: Python 3.12.9, CUDA 12.3, PyTorch 2.2. Manual installation of Flash Attention may be required based on your environment.
Weights: Download pretrained weights from HuggingFace using HF_ENDPOINT=https://hf-mirror.com huggingface-cli download LuckyLiGY/MagicTryOn --local-dir ./weights/MagicTryOn_14B_V1.
Demo: Inference scripts for image and video try-on are provided. Custom try-on requires additional steps for garment captioning, line map extraction, mask generation, agnostic representation, and DensePose estimation.
Documentation: Paper on ArXiv

Highlighted Details

Employs a large-scale video diffusion Transformer (Wan2.1-I2V-14B) as the backbone.
Features full self-attention for modeling spatiotemporal consistency.
Introduces a coarse-to-fine garment preservation strategy with mask-aware loss.
Supports customized try-on with components for garment captioning (Qwen2.5-VL-7B-Instruct), line map extraction (AniLines), mask generation, agnostic representation, and DensePose.

Maintenance & Community

The project is actively developed, with recent releases of code and pretrained weights. Further updates are planned for Gradio App, V1.3B weights, testing scripts, and training scripts.

Licensing & Compatibility

Released under the Creative Commons BY-NC-SA 4.0 license. This license permits copying, redistribution, remixing, and transformation for non-commercial purposes, provided appropriate credit is given and contributions are shared under the same license. Commercial use or linking with closed-source projects is restricted.

Limitations & Caveats

The framework is primarily for non-commercial use due to its CC BY-NC-SA 4.0 license. The customized try-on pipeline involves multiple complex steps and external model dependencies, which may require significant setup and troubleshooting.

Magic-TryOn by vivoCameraResearch

Explore Similar Projects

Keye by Kwai-Keye

MiraData by mira-space

vid2vid-zero by baaivision

autovfx by haoyuhsu

kandinsky-5 by kandinskylab

ZenCtrl by FotographerAI

WonderJourney by KovenYu

CLIP-Guided-Diffusion by nerdyrodent

SD-CN-Animation by volotat

FateZero by ChenyangQiQi

TokenFlow by omerbt

Rerender_A_Video by williamyang1991