Comfyui_image2prompt by zhongpei

ComfyUI nodes for image-to-prompt workflows

Created 2 years ago

385 stars

Top 73.9% on SourcePulse

Project Summary

This ComfyUI plugin enhances image-to-prompt generation by integrating multiple vision-language models and taggers, offering tailored descriptions for characters and scenes. It targets ComfyUI users seeking more accurate and detailed prompts for text-to-image generation, particularly for 7B-level diffusion models.

How It Works

The plugin leverages a combination of models: wd-swinv2-tagger-v3 for precise character tagging, moondream1 for detailed scene descriptions, and moondream2 for concise scene descriptions. It also incorporates Text2GPTPrompt to create optimized prompts for 7B models like qwen1.5-7b and deepseek-ai/deepseek-vl-7b-chat, utilizing fine-tuned models like hahafofo/Qwen-1_8B-Stable-Diffusion-Prompt for diverse prompt generation, including classical poetry.

Quick Start & Requirements

Install via git clone https://github.com/zhongpei/Comfyui-image2prompt into ComfyUI's custom_nodes directory.
Requires downloading models (e.g., vikhyatk/moondream1, vikhyatk/moondream2, internlm/internlm-xcomposer2-vl-7b, unum-cloud/uform-gen2-qwen-500m) into ComfyUI/models/image2text.
Automatic model download on first run is supported; manual download via huggingface-cli is an alternative.
See official documentation for detailed model download instructions.

Highlighted Details

wd-swinv2-tagger-v3 enhances character description accuracy.
Recommends moondream1 + wd-swinv2-tagger-v3 for scenes, moondream2 + wd-swinv2-tagger-v3 for characters.
Text2GPTPrompt node customizes prompts for 7B models.
Supports T5 models like roborovski/superprompt-v1 and integrates ImageReward for aesthetic evaluation.

Maintenance & Community

Project is hosted on GitHub. No specific community channels or notable contributors are listed in the README.

Licensing & Compatibility

The README does not explicitly state a license. The underlying models have their own licenses (e.g., Moondream models are typically under Apache 2.0). Compatibility for commercial use depends on the licenses of the individual models used.

Limitations & Caveats

Manual model downloading may be required if automatic download fails. The README suggests specific model pairings for optimal results, implying that other combinations might be less effective.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days