Discover and explore top open-source AI tools and projects—updated daily.
zhongpeiComfyUI nodes for image-to-prompt workflows
Top 75.0% on SourcePulse
This ComfyUI plugin enhances image-to-prompt generation by integrating multiple vision-language models and taggers, offering tailored descriptions for characters and scenes. It targets ComfyUI users seeking more accurate and detailed prompts for text-to-image generation, particularly for 7B-level diffusion models.
How It Works
The plugin leverages a combination of models: wd-swinv2-tagger-v3 for precise character tagging, moondream1 for detailed scene descriptions, and moondream2 for concise scene descriptions. It also incorporates Text2GPTPrompt to create optimized prompts for 7B models like qwen1.5-7b and deepseek-ai/deepseek-vl-7b-chat, utilizing fine-tuned models like hahafofo/Qwen-1_8B-Stable-Diffusion-Prompt for diverse prompt generation, including classical poetry.
Quick Start & Requirements
git clone https://github.com/zhongpei/Comfyui-image2prompt into ComfyUI's custom_nodes directory.vikhyatk/moondream1, vikhyatk/moondream2, internlm/internlm-xcomposer2-vl-7b, unum-cloud/uform-gen2-qwen-500m) into ComfyUI/models/image2text.huggingface-cli is an alternative.Highlighted Details
wd-swinv2-tagger-v3 enhances character description accuracy.moondream1 + wd-swinv2-tagger-v3 for scenes, moondream2 + wd-swinv2-tagger-v3 for characters.Text2GPTPrompt node customizes prompts for 7B models.roborovski/superprompt-v1 and integrates ImageReward for aesthetic evaluation.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Manual model downloading may be required if automatic download fails. The README suggests specific model pairings for optimal results, implying that other combinations might be less effective.
10 months ago
1 week
ai-forever
OFA-Sys
lucidrains