OMG  by kongzhecn

ECCV 2024 research paper for personalized multi-concept image generation

created 1 year ago
692 stars

Top 50.1% on sourcepulse

GitHubView on GitHub
Project Summary

OMG is a framework for personalized multi-concept image generation using diffusion models, targeting users who want to generate images with multiple characters or styles. It integrates with LoRAs and InstantID to achieve high-quality, occlusion-friendly results, enabling complex scene compositions with distinct identities and artistic styles.

How It Works

OMG leverages Stable Diffusion XL as its base model, enhanced by LoRAs for character and style customization. It integrates with InstantID for single-image identity preservation and supports ControlNet for layout and pose control. The framework uses visual comprehension models like YoloWorld or GroundingDINO with SAM for precise object and person segmentation, facilitating complex prompt rewriting for multi-element generation.

Quick Start & Requirements

  • Install: Create a conda environment (conda create -n OMG python=3.10.6), activate it (conda activate OMG), install PyTorch (pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2), and then install requirements (pip install -r requirements.txt). Additional dependencies like segment-anything and potentially inference[yolo-world] or GroundingDINO are required.
  • Prerequisites: Python 3.10.6, PyTorch 2.0.1, TorchVision 0.15.2. CUDA support is strongly recommended. Pre-trained models for SDXL, ControlNet, InstantID, SAM, and various LoRAs must be downloaded and placed in the checkpoint directory.
  • Setup: Requires downloading multiple large pre-trained models and LoRAs.
  • Docs: Hugging Face Space, Hugging Face Space

Highlighted Details

  • Supports multi-character generation using LoRAs or single-image personalization with InstantID.
  • Integrates with ControlNet for spatial and pose conditioning.
  • Offers style control via style LoRAs.
  • Provides options for visual comprehension using YoloWorld+EfficientViT SAM or GroundingDINO+SAM.

Maintenance & Community

The project is associated with ECCV 2024 and lists several authors with academic affiliations. Links to Hugging Face Spaces are provided for demos.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

The setup process requires downloading a significant number of large pre-trained models and LoRAs, which can be time-consuming and resource-intensive. The README does not specify a license, which may impact commercial use.

Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
8 stars in the last 90 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), Omar Sanseviero Omar Sanseviero(DevRel at Google DeepMind), and
1 more.

EditAnything by sail-sg

0.0%
3k
Image editing research paper using segmentation and diffusion
created 2 years ago
updated 5 months ago
Feedback? Help us improve.