ComfyUI_omost by huchenlei

ComfyUI nodes for regional prompt-driven image generation

Created 1 year ago

446 stars

Top 67.2% on SourcePulse

2 Experts Love This Project

robinjhuang

Cofounder of Comfy Org

yoland68

Cofounder of Comfy Org

Project Summary

This repository provides ComfyUI nodes for Omost, a framework for regional prompting in diffusion models. It enables users to interact with Large Language Models (LLMs) to generate structured prompts that define specific regions and content within an image, facilitating precise control over image generation.

How It Works

The core functionality revolves around an LLM Chat interface, where users converse with an LLM to produce a JSON-like structure detailing image regions, associated prompts (prefixes/suffixes), and color information. This structured data then guides the diffusion process. The implementation supports multiple regional prompting methods, including built-in attention masking (Overlay/Average) and integration with Dense Diffusion for more advanced attention score manipulation.

Quick Start & Requirements

Install via ComfyUI custom nodes.
Requires ComfyUI.
For accelerated LLM inference, Text Generation Inference (TGI) or llama.cpp with GGUF models is recommended. TGI requires ~20GB VRAM for an 8B LLM.
Official documentation: https://github.com/huchenlei/ComfyUI_omost

Highlighted Details

LLM Chat nodes for interactive prompt generation.
Canvas editor for manual region and prompt manipulation.
Support for OmostDenseDiffusion backend for advanced regional control.
Options to connect to external LLM services (TGI, llama.cpp) for accelerated inference.

Maintenance & Community

Active development with recent updates in June 2024.
Links to potential community channels are not explicitly provided in the README.

Licensing & Compatibility

The repository itself appears to be under a permissive license, but it integrates with other projects (Omost, ComfyUI, DenseDiffusion, TGI, llama.cpp) which have their own licenses. Users should verify compatibility, especially for commercial use.

Limitations & Caveats

Some advanced regional prompting methods (gradient optimization, external control models) are listed as "To be implemented."
The base LLM inference can be slow (3-5 minutes per chat on a 4090) without acceleration.
ComfyUI_densediffusion does not compose with IPAdapter.

Health Check

Last Commit

10 months ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

CE3D by Fangkang515

3D scene editor for interactive manipulation via LLM-driven chat

Created 1 year ago

Updated 7 months ago

ViP-LLaVA by WisconsinAIVision

Multimodal model for understanding visual prompts

Created 2 years ago

Updated 1 year ago

VLM-Visualizer by zjysteven

Visualizing attention in vision-language models

Created 1 year ago

Updated 10 months ago

LLMGA by JIA-Lab-research

Multimodal LLM for image generation/editing, leveraging LLMs for detailed prompts

Created 2 years ago

Updated 7 months ago

Mini-DALLE3 by Zeqiang-Lai

Text-to-image research paper using LLMs for interactive prompting

Created 2 years ago

Updated 2 years ago

LLM-groundedDiffusion by TonyLianLong

Research paper enhancing text-to-image diffusion models using LLMs

Created 2 years ago

Updated 1 year ago

awesome-comfyui by ComfyUI-Workflow

ComfyUI custom nodes extend its capabilities for AI workflows

Created 1 year ago

Updated 5 months ago

art-msra by microsoft

Anonymous Region Transformer for multi-layer image generation

Created 1 year ago

Updated 5 months ago

Starred by

Andreas Jansson

Andreas Jansson(Cofounder of Replicate),

Christian Laforte

Christian Laforte(Distinguished Engineer at NVIDIA; Former CTO at Stability AI), and

1 more.

paint-with-words-sd by cloneofsimo

Stable Diffusion for text-guided image generation from segmentation maps

Created 3 years ago

Updated 2 years ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

RPG-DiffusionMaster by YangLing0818

Training-free paradigm for text-to-image generation/editing

Created 2 years ago

Updated 11 months ago

Starred by

Jaret Burkett

Jaret Burkett(Founder of Ostris).

sd-webui-regional-prompter by hako-mikan

Script for Stable Diffusion WebUI enabling regional prompting

Created 2 years ago

Updated 1 month ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI),

Pietro Schirano

Pietro Schirano(Founder of MagicPath), and

3 more.

Omost by lllyasviel

Image composer using LLMs to generate code for image creation

Created 1 year ago

Updated 1 year ago

Feedback? Help us improve.