MIGC  by limuloo

Text-to-image synthesis research paper using multi-instance generation control

created 1 year ago
606 stars

Top 54.8% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

MIGC and MIGC++ offer advanced control over text-to-image synthesis, enabling users to specify multiple object instances with precise locations and attributes. This is particularly beneficial for researchers and artists requiring fine-grained control over generated imagery, moving beyond simple text prompts to detailed scene composition.

How It Works

MIGC acts as a plug-and-play controller for diffusion models, leveraging spatial conditioning (masks and boxes) to guide instance generation. MIGC++ enhances this by supporting simultaneous box and mask control, and introduces an iterative editing mode ("Consistent-MIG") for modifying specific instances while maintaining overall image consistency. This approach improves instance success rates and reduces attribute leakage compared to baseline methods.

Quick Start & Requirements

  • Installation: Requires Python 3.9 and Conda environment setup. Install via pip install -r requirement.txt and pip install -e ..
  • Checkpoints: Download MIGC_SD14.ckpt (219M) or MIGC++_SD14.ckpt (191M) and place in pretrained_weights/. Note: MIGC_SD14.ckpt can be used with SD1.5.
  • Dependencies: Stable Diffusion, diffusers, CLIP, GLIGEN-GUI.
  • Demo: Colab demo available.
  • GUI: A MIGC-GUI is provided, requiring additional model downloads (CLIPTextModel, CetusMix).

Highlighted Details

  • Achieves state-of-the-art performance on the COCO-MIG benchmark, outperforming methods like InstanceDiffusion in MIOU and Instance Success Rate.
  • Supports integration with LoRA for enhanced attribute and position control.
  • Offers an iterative editing mode (Consistent-MIG) for targeted image modifications.
  • Provides a GUI for more convenient art creation.

Maintenance & Community

The project is supervised by ReLER Lab at Zhejiang University and HUAWEI. Key contributors include Dewei Zhou and Yi Yang. Updates include the release of MIGC++ weights and the iterative editing mode. Contact email: zdw1999@zju.edu.cn.

Licensing & Compatibility

The repository is released under a permissive license, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

Training code is not available due to company requirements. Pretrained weights for SD1.5, SD2, and SDXL are listed as "coming soon." The MIGC-GUI is still under optimization.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
16 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
1 more.

EditAnything by sail-sg

0.0%
3k
Image editing research paper using segmentation and diffusion
created 2 years ago
updated 5 months ago
Feedback? Help us improve.