Discover and explore top open-source AI tools and projects—updated daily.
zhongpeiComfyUI nodes for image-to-prompt workflows
Top 77.7% on SourcePulse
This ComfyUI plugin enhances image-to-prompt generation by integrating multiple vision-language models and taggers, offering tailored descriptions for characters and scenes. It targets ComfyUI users seeking more accurate and detailed prompts for text-to-image generation, particularly for 7B-level diffusion models.
How It Works
The plugin leverages a combination of models: wd-swinv2-tagger-v3 for precise character tagging, moondream1 for detailed scene descriptions, and moondream2 for concise scene descriptions. It also incorporates Text2GPTPrompt to create optimized prompts for 7B models like qwen1.5-7b and deepseek-ai/deepseek-vl-7b-chat, utilizing fine-tuned models like hahafofo/Qwen-1_8B-Stable-Diffusion-Prompt for diverse prompt generation, including classical poetry.
Quick Start & Requirements
git clone https://github.com/zhongpei/Comfyui-image2prompt into ComfyUI's custom_nodes directory.vikhyatk/moondream1, vikhyatk/moondream2, internlm/internlm-xcomposer2-vl-7b, unum-cloud/uform-gen2-qwen-500m) into ComfyUI/models/image2text.huggingface-cli is an alternative.Highlighted Details
wd-swinv2-tagger-v3 enhances character description accuracy.moondream1 + wd-swinv2-tagger-v3 for scenes, moondream2 + wd-swinv2-tagger-v3 for characters.Text2GPTPrompt node customizes prompts for 7B models.roborovski/superprompt-v1 and integrates ImageReward for aesthetic evaluation.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
Manual model downloading may be required if automatic download fails. The README suggests specific model pairings for optimal results, implying that other combinations might be less effective.
5 months ago
1 week
ai-forever
OFA-Sys
lucidrains