ComfyUI nodes for image-to-prompt workflows
Top 80.6% on sourcepulse
This ComfyUI plugin enhances image-to-prompt generation by integrating multiple vision-language models and taggers, offering tailored descriptions for characters and scenes. It targets ComfyUI users seeking more accurate and detailed prompts for text-to-image generation, particularly for 7B-level diffusion models.
How It Works
The plugin leverages a combination of models: wd-swinv2-tagger-v3
for precise character tagging, moondream1
for detailed scene descriptions, and moondream2
for concise scene descriptions. It also incorporates Text2GPTPrompt
to create optimized prompts for 7B models like qwen1.5-7b
and deepseek-ai/deepseek-vl-7b-chat
, utilizing fine-tuned models like hahafofo/Qwen-1_8B-Stable-Diffusion-Prompt
for diverse prompt generation, including classical poetry.
Quick Start & Requirements
git clone https://github.com/zhongpei/Comfyui-image2prompt
into ComfyUI's custom_nodes
directory.vikhyatk/moondream1
, vikhyatk/moondream2
, internlm/internlm-xcomposer2-vl-7b
, unum-cloud/uform-gen2-qwen-500m
) into ComfyUI/models/image2text
.huggingface-cli
is an alternative.Highlighted Details
wd-swinv2-tagger-v3
enhances character description accuracy.moondream1
+ wd-swinv2-tagger-v3
for scenes, moondream2
+ wd-swinv2-tagger-v3
for characters.Text2GPTPrompt
node customizes prompts for 7B models.roborovski/superprompt-v1
and integrates ImageReward
for aesthetic evaluation.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Manual model downloading may be required if automatic download fails. The README suggests specific model pairings for optimal results, implying that other combinations might be less effective.
1 month ago
1 week