ECCV 2024 research paper for personalized multi-concept image generation
Top 50.1% on sourcepulse
OMG is a framework for personalized multi-concept image generation using diffusion models, targeting users who want to generate images with multiple characters or styles. It integrates with LoRAs and InstantID to achieve high-quality, occlusion-friendly results, enabling complex scene compositions with distinct identities and artistic styles.
How It Works
OMG leverages Stable Diffusion XL as its base model, enhanced by LoRAs for character and style customization. It integrates with InstantID for single-image identity preservation and supports ControlNet for layout and pose control. The framework uses visual comprehension models like YoloWorld or GroundingDINO with SAM for precise object and person segmentation, facilitating complex prompt rewriting for multi-element generation.
Quick Start & Requirements
conda create -n OMG python=3.10.6
), activate it (conda activate OMG
), install PyTorch (pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
), and then install requirements (pip install -r requirements.txt
). Additional dependencies like segment-anything
and potentially inference[yolo-world]
or GroundingDINO
are required.checkpoint
directory.Highlighted Details
Maintenance & Community
The project is associated with ECCV 2024 and lists several authors with academic affiliations. Links to Hugging Face Spaces are provided for demos.
Licensing & Compatibility
The repository does not explicitly state a license in the README.
Limitations & Caveats
The setup process requires downloading a significant number of large pre-trained models and LoRAs, which can be time-consuming and resource-intensive. The README does not specify a license, which may impact commercial use.
1 year ago
1 week