CatVTON  by Zheng-Chong

Virtual try-on diffusion model research paper

created 1 year ago
1,458 stars

Top 28.7% on sourcepulse

GitHubView on GitHub
Project Summary

CatVTON is a diffusion model for virtual try-on, designed for efficiency and ease of use. It targets researchers and developers in computer vision and fashion tech, offering a lightweight architecture for high-resolution image generation with reduced VRAM requirements.

How It Works

CatVTON leverages a diffusion model architecture, specifically building upon Stable Diffusion v1.5. Its novelty lies in a "concatenation" approach, enabling parameter-efficient training and simplified inference. This method allows for a total network size of 899.06M parameters, with only 49.57M trainable, and inference requiring less than 8GB VRAM for 1024x768 resolution.

Quick Start & Requirements

  • Install: pip install -r requirements.txt within a conda environment.
  • Prerequisites: Python 3.9.0, CUDA. Datasets like VITON-HD or DressCode are required for inference.
  • Deployment: ComfyUI workflow and Gradio app are available.
  • Docs: https://arxiv.org/abs/2407.15886

Highlighted Details

  • Accepted to ICLR 2025.
  • Supports 1024x768 resolution with < 8GB VRAM.
  • Parameter-efficient training (49.57M trainable parameters).
  • Mask-free version available.
  • Integrates with ComfyUI.

Maintenance & Community

  • Active development with recent updates (CatV2TON, FLUX.1-Fill-dev LoRA).
  • HuggingFace Space available.
  • Maintains an "Awesome-Try-On-Models" repository.

Licensing & Compatibility

  • Licensed under Creative Commons BY-NC-SA 4.0.
  • Non-commercial use only. Contributions must be shared under the same license.

Limitations & Caveats

The project is primarily tested on Linux; Windows users may encounter issues (refer to issue#8). The Gradio app is noted as not a stable version.

Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
2
Star History
121 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm), and
2 more.

mflux by filipstrand

0.8%
2k
MLX port of FLUX for local image generation on Macs
created 11 months ago
updated 6 hours ago
Feedback? Help us improve.