Magic-TryOn  by vivoCameraResearch

Video virtual try-on framework

Created 3 months ago
454 stars

Top 66.5% on SourcePulse

GitHubView on GitHub
Project Summary

MagicTryOn is a video virtual try-on framework designed for researchers and developers in computer vision and graphics. It leverages a large-scale video diffusion Transformer to enable realistic garment-preserving virtual try-on experiences, offering a coarse-to-fine strategy for enhanced garment fidelity.

How It Works

The framework utilizes a Wan2.1 diffusion Transformer backbone with full self-attention to maintain spatiotemporal consistency across video frames. A key innovation is its coarse-to-fine garment preservation strategy, augmented by a mask-aware loss function, which specifically targets and enhances the fidelity of the garment region during the try-on process.

Quick Start & Requirements

  • Installation: Create a conda environment (conda create -n magictryon python==3.12.9) and activate it (conda activate magictryon). Install dependencies via pip install -r requirements.txt or conda env create -f environment.yaml.
  • Prerequisites: Python 3.12.9, CUDA 12.3, PyTorch 2.2. Manual installation of Flash Attention may be required based on your environment.
  • Weights: Download pretrained weights from HuggingFace using HF_ENDPOINT=https://hf-mirror.com huggingface-cli download LuckyLiGY/MagicTryOn --local-dir ./weights/MagicTryOn_14B_V1.
  • Demo: Inference scripts for image and video try-on are provided. Custom try-on requires additional steps for garment captioning, line map extraction, mask generation, agnostic representation, and DensePose estimation.
  • Documentation: Paper on ArXiv

Highlighted Details

  • Employs a large-scale video diffusion Transformer (Wan2.1-I2V-14B) as the backbone.
  • Features full self-attention for modeling spatiotemporal consistency.
  • Introduces a coarse-to-fine garment preservation strategy with mask-aware loss.
  • Supports customized try-on with components for garment captioning (Qwen2.5-VL-7B-Instruct), line map extraction (AniLines), mask generation, agnostic representation, and DensePose.

Maintenance & Community

The project is actively developed, with recent releases of code and pretrained weights. Further updates are planned for Gradio App, V1.3B weights, testing scripts, and training scripts.

Licensing & Compatibility

Released under the Creative Commons BY-NC-SA 4.0 license. This license permits copying, redistribution, remixing, and transformation for non-commercial purposes, provided appropriate credit is given and contributions are shared under the same license. Commercial use or linking with closed-source projects is restricted.

Limitations & Caveats

The framework is primarily for non-commercial use due to its CC BY-NC-SA 4.0 license. The customized try-on pipeline involves multiple complex steps and external model dependencies, which may require significant setup and troubleshooting.

Health Check
Last Commit

4 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
6
Star History
27 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.