multimodal-garment-designer by aimagelab

AI model for fashion image editing via multimodal prompts

Created 2 years ago

440 stars

Top 67.8% on SourcePulse

Project Summary

This repository provides the official implementation for "Multimodal Garment Designer," a novel approach to fashion image editing using human-centric latent diffusion models. It enables fashion image generation and editing guided by multimodal prompts like text, body poses, and garment sketches, targeting fashion designers and researchers in computer vision and AI for fashion.

How It Works

The project leverages latent diffusion models, a technique not previously applied to fashion image editing. It introduces a new architecture designed to handle multimodal conditioning, allowing for precise control over generated fashion imagery. The approach is advantageous for its ability to integrate diverse input modalities for realistic and coherent fashion image manipulation.

Quick Start & Requirements

Installation: Recommended via Anaconda. Clone the repo and create an environment: git clone https://github.com/aimagelab/multimodal-garment-designer followed by conda env create -n mgd -f environment.yml and conda activate mgd.
Dependencies: Python 3.9, PyTorch 1.12.1, CUDA (implied by PyTorch version), diffusers 0.12.0, transformers 4.25.1.
Inference: python src/eval.py --dataset_path <path> --batch_size <int> --mixed_precision fp16 --output_dir <path> --save_name <string> --num_workers_test <int> --sketch_cond_rate 0.2 --dataset <dresscode|vitonhd> --start_cond_rate 0.0 --test_order <paired|unpaired>
Data: Requires downloading original Viton-HD and Dress Code datasets, plus additional multimodal annotations. See dataset preparation instructions.

Highlighted Details

Extends Dress Code and VITON-HD datasets with semi-automatically collected multimodal annotations.
Offers pre-trained models loadable via torch.hub.
Provides a custom MGDPipe integrating the MGD denoising UNet with standard diffusers components.
Introduces Ti-MGD, a more recent work incorporating fabric texture conditioning.

Maintenance & Community

The project is associated with ICCV 2023. Further work on multimodal fashion image editing is available in a more recent publication. No specific community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

Licensed under Creative Commons BY-NC 4.0. This license permits redistribution and adaptation for non-commercial purposes only, requiring appropriate credit and indication of changes.

Limitations & Caveats

The repository explicitly states that training code is a future TODO item. The license restricts commercial use, which may limit adoption in commercial product development.

multimodal-garment-designer by aimagelab

Explore Similar Projects

DIVA by baaivision

e4t-diffusion by mkshing

StyleKeeper by naver-ai

glid-3-xl-stable by Jack000

diffusion-self-distillation by primecai

fashion-clip by patrickjohncyh

glid-3-xl by Jack000

Generative-AI by fnzhan

BLIP3o by JiuhaiChen

SAM by yuval-alaluf

stable-diffusion by pesser

StyleCLIP by orpatashnik