omini-kontext by Saquib764

Image editing framework with multi-image references

Created 7 months ago

435 stars

Top 68.4% on SourcePulse

Project Summary

This repository provides Omini Kontext, a framework for multi-image reference-based image editing and generation, built upon the Flux.1-Kontext-dev model. It enables users to perform tasks like spatial character insertion into existing scenes by leveraging 3D RoPE embeddings, offering a novel approach to reference-based image manipulation for researchers and AI artists.

How It Works

Omini Kontext modifies the Flux.1-Kontext-dev model by implementing 3D RoPE embeddings instead of the original 2D embeddings. This technique, inspired by the OminiControl project, allows for reference-based image generation and editing. The framework supports LoRA integration for fine-tuning specific tasks, such as character or product insertion, with adjustable reference_delta parameters to control placement.

Quick Start & Requirements

Installation: Clone the repository and install dependencies using pip install -r requirements.txt. For ComfyUI integration, clone the repo into ComfyUI/custom_nodes.
Prerequisites: Python 3.8+, CUDA-compatible GPU (24GB+ VRAM recommended), PyTorch 2.0+, HuggingFace account.
Setup: Basic setup involves installing dependencies and loading the base model. ComfyUI integration requires placing the repo in the custom nodes directory.
Links: Live Demo, Replicate Version, ComfyUI Integration.

Highlighted Details

Enables spatial character insertion with control over placement via reference_delta.
Offers pre-trained LoRA models for character and product insertion.
Provides ComfyUI custom nodes for seamless integration into existing workflows.
Includes scripts for data preparation and training, supporting multi-GPU setups and checkpoint resuming.

Maintenance & Community

The project welcomes community contributions. Discussions and support are primarily handled through GitHub Issues and Discussions.

Licensing & Compatibility

Licensed under the Apache License 2.0, allowing for commercial use and integration with closed-source projects.

Limitations & Caveats

The current character insertion examples are described as Proofs of Concept (POCs) with plans for more robust models trained on larger datasets. Some results may not be optimal, and users might need to adjust image resolutions for better scaling. The project aims to extend functionality to multiple references and Qwen-Image-Edit support.

Health Check

Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days