multimodal-garment-designer  by aimagelab

AI model for fashion image editing via multimodal prompts

Created 2 years ago
437 stars

Top 68.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for "Multimodal Garment Designer," a novel approach to fashion image editing using human-centric latent diffusion models. It enables fashion image generation and editing guided by multimodal prompts like text, body poses, and garment sketches, targeting fashion designers and researchers in computer vision and AI for fashion.

How It Works

The project leverages latent diffusion models, a technique not previously applied to fashion image editing. It introduces a new architecture designed to handle multimodal conditioning, allowing for precise control over generated fashion imagery. The approach is advantageous for its ability to integrate diverse input modalities for realistic and coherent fashion image manipulation.

Quick Start & Requirements

  • Installation: Recommended via Anaconda. Clone the repo and create an environment: git clone https://github.com/aimagelab/multimodal-garment-designer followed by conda env create -n mgd -f environment.yml and conda activate mgd.
  • Dependencies: Python 3.9, PyTorch 1.12.1, CUDA (implied by PyTorch version), diffusers 0.12.0, transformers 4.25.1.
  • Inference: python src/eval.py --dataset_path <path> --batch_size <int> --mixed_precision fp16 --output_dir <path> --save_name <string> --num_workers_test <int> --sketch_cond_rate 0.2 --dataset <dresscode|vitonhd> --start_cond_rate 0.0 --test_order <paired|unpaired>
  • Data: Requires downloading original Viton-HD and Dress Code datasets, plus additional multimodal annotations. See dataset preparation instructions.

Highlighted Details

  • Extends Dress Code and VITON-HD datasets with semi-automatically collected multimodal annotations.
  • Offers pre-trained models loadable via torch.hub.
  • Provides a custom MGDPipe integrating the MGD denoising UNet with standard diffusers components.
  • Introduces Ti-MGD, a more recent work incorporating fabric texture conditioning.

Maintenance & Community

The project is associated with ICCV 2023. Further work on multimodal fashion image editing is available in a more recent publication. No specific community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

Licensed under Creative Commons BY-NC 4.0. This license permits redistribution and adaptation for non-commercial purposes only, requiring appropriate credit and indication of changes.

Limitations & Caveats

The repository explicitly states that training code is a future TODO item. The license restricts commercial use, which may limit adoption in commercial product development.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.