OMG by kongzhecn

ECCV 2024 research paper for personalized multi-concept image generation

Created 1 year ago

701 stars

Top 48.8% on SourcePulse

Project Summary

OMG is a framework for personalized multi-concept image generation using diffusion models, targeting users who want to generate images with multiple characters or styles. It integrates with LoRAs and InstantID to achieve high-quality, occlusion-friendly results, enabling complex scene compositions with distinct identities and artistic styles.

How It Works

OMG leverages Stable Diffusion XL as its base model, enhanced by LoRAs for character and style customization. It integrates with InstantID for single-image identity preservation and supports ControlNet for layout and pose control. The framework uses visual comprehension models like YoloWorld or GroundingDINO with SAM for precise object and person segmentation, facilitating complex prompt rewriting for multi-element generation.

Quick Start & Requirements

Install: Create a conda environment (conda create -n OMG python=3.10.6), activate it (conda activate OMG), install PyTorch (pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2), and then install requirements (pip install -r requirements.txt). Additional dependencies like segment-anything and potentially inference[yolo-world] or GroundingDINO are required.
Prerequisites: Python 3.10.6, PyTorch 2.0.1, TorchVision 0.15.2. CUDA support is strongly recommended. Pre-trained models for SDXL, ControlNet, InstantID, SAM, and various LoRAs must be downloaded and placed in the checkpoint directory.
Setup: Requires downloading multiple large pre-trained models and LoRAs.
Docs: Hugging Face Space, Hugging Face Space

Highlighted Details

Supports multi-character generation using LoRAs or single-image personalization with InstantID.
Integrates with ControlNet for spatial and pose conditioning.
Offers style control via style LoRAs.
Provides options for visual comprehension using YoloWorld+EfficientViT SAM or GroundingDINO+SAM.

Maintenance & Community

The project is associated with ECCV 2024 and lists several authors with academic affiliations. Links to Hugging Face Spaces are provided for demos.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

The setup process requires downloading a significant number of large pre-trained models and LoRAs, which can be time-consuming and resource-intensive. The README does not specify a license, which may impact commercial use.

OMG by kongzhecn

Explore Similar Projects

ShareGPT-4o-Image by FreedomIntelligence

OmniGen2 by VectorSpaceLab

UltraPixel by catcathh

Liquid by FoundationVision

diffusion-self-distillation by primecai

OmniSVG by OmniSVG

LlamaGen by FoundationVision

RPG-DiffusionMaster by YangLing0818

big-sleep by lucidrains

OmniGen by VectorSpaceLab

StableCascade by Stability-AI

Janus by deepseek-ai