multimodal-garment-designer  by aimagelab

AI model for fashion image editing via multimodal prompts

created 2 years ago
433 stars

Top 69.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official implementation for "Multimodal Garment Designer," a novel approach to fashion image editing using human-centric latent diffusion models. It enables fashion image generation and editing guided by multimodal prompts like text, body poses, and garment sketches, targeting fashion designers and researchers in computer vision and AI for fashion.

How It Works

The project leverages latent diffusion models, a technique not previously applied to fashion image editing. It introduces a new architecture designed to handle multimodal conditioning, allowing for precise control over generated fashion imagery. The approach is advantageous for its ability to integrate diverse input modalities for realistic and coherent fashion image manipulation.

Quick Start & Requirements

  • Installation: Recommended via Anaconda. Clone the repo and create an environment: git clone https://github.com/aimagelab/multimodal-garment-designer followed by conda env create -n mgd -f environment.yml and conda activate mgd.
  • Dependencies: Python 3.9, PyTorch 1.12.1, CUDA (implied by PyTorch version), diffusers 0.12.0, transformers 4.25.1.
  • Inference: python src/eval.py --dataset_path <path> --batch_size <int> --mixed_precision fp16 --output_dir <path> --save_name <string> --num_workers_test <int> --sketch_cond_rate 0.2 --dataset <dresscode|vitonhd> --start_cond_rate 0.0 --test_order <paired|unpaired>
  • Data: Requires downloading original Viton-HD and Dress Code datasets, plus additional multimodal annotations. See dataset preparation instructions.

Highlighted Details

  • Extends Dress Code and VITON-HD datasets with semi-automatically collected multimodal annotations.
  • Offers pre-trained models loadable via torch.hub.
  • Provides a custom MGDPipe integrating the MGD denoising UNet with standard diffusers components.
  • Introduces Ti-MGD, a more recent work incorporating fabric texture conditioning.

Maintenance & Community

The project is associated with ICCV 2023. Further work on multimodal fashion image editing is available in a more recent publication. No specific community channels (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

Licensed under Creative Commons BY-NC 4.0. This license permits redistribution and adaptation for non-commercial purposes only, requiring appropriate credit and indication of changes.

Limitations & Caveats

The repository explicitly states that training code is a future TODO item. The license restricts commercial use, which may limit adoption in commercial product development.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
11 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
1 more.

EditAnything by sail-sg

0.1%
3k
Image editing research paper using segmentation and diffusion
created 2 years ago
updated 5 months ago
Feedback? Help us improve.