Comfyui_image2prompt  by zhongpei

ComfyUI nodes for image-to-prompt workflows

created 1 year ago
350 stars

Top 80.6% on sourcepulse

GitHubView on GitHub
Project Summary

This ComfyUI plugin enhances image-to-prompt generation by integrating multiple vision-language models and taggers, offering tailored descriptions for characters and scenes. It targets ComfyUI users seeking more accurate and detailed prompts for text-to-image generation, particularly for 7B-level diffusion models.

How It Works

The plugin leverages a combination of models: wd-swinv2-tagger-v3 for precise character tagging, moondream1 for detailed scene descriptions, and moondream2 for concise scene descriptions. It also incorporates Text2GPTPrompt to create optimized prompts for 7B models like qwen1.5-7b and deepseek-ai/deepseek-vl-7b-chat, utilizing fine-tuned models like hahafofo/Qwen-1_8B-Stable-Diffusion-Prompt for diverse prompt generation, including classical poetry.

Quick Start & Requirements

  • Install via git clone https://github.com/zhongpei/Comfyui-image2prompt into ComfyUI's custom_nodes directory.
  • Requires downloading models (e.g., vikhyatk/moondream1, vikhyatk/moondream2, internlm/internlm-xcomposer2-vl-7b, unum-cloud/uform-gen2-qwen-500m) into ComfyUI/models/image2text.
  • Automatic model download on first run is supported; manual download via huggingface-cli is an alternative.
  • See official documentation for detailed model download instructions.

Highlighted Details

  • wd-swinv2-tagger-v3 enhances character description accuracy.
  • Recommends moondream1 + wd-swinv2-tagger-v3 for scenes, moondream2 + wd-swinv2-tagger-v3 for characters.
  • Text2GPTPrompt node customizes prompts for 7B models.
  • Supports T5 models like roborovski/superprompt-v1 and integrates ImageReward for aesthetic evaluation.

Maintenance & Community

  • Project is hosted on GitHub. No specific community channels or notable contributors are listed in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The underlying models have their own licenses (e.g., Moondream models are typically under Apache 2.0). Compatibility for commercial use depends on the licenses of the individual models used.

Limitations & Caveats

Manual model downloading may be required if automatic download fails. The README suggests specific model pairings for optimal results, implying that other combinations might be less effective.

Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.