Virtual try-on diffusion model research paper
Top 28.7% on sourcepulse
CatVTON is a diffusion model for virtual try-on, designed for efficiency and ease of use. It targets researchers and developers in computer vision and fashion tech, offering a lightweight architecture for high-resolution image generation with reduced VRAM requirements.
How It Works
CatVTON leverages a diffusion model architecture, specifically building upon Stable Diffusion v1.5. Its novelty lies in a "concatenation" approach, enabling parameter-efficient training and simplified inference. This method allows for a total network size of 899.06M parameters, with only 49.57M trainable, and inference requiring less than 8GB VRAM for 1024x768 resolution.
Quick Start & Requirements
pip install -r requirements.txt
within a conda environment.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is primarily tested on Linux; Windows users may encounter issues (refer to issue#8). The Gradio app is noted as not a stable version.
5 months ago
1 day