Comfyui_CXH_joy_caption  by StartHua

ComfyUI extension for image captioning and tagging workflows

created 11 months ago
591 stars

Top 55.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides ComfyUI nodes for advanced image captioning and prompt generation, targeting users of Stable Diffusion and similar generative AI models. It integrates multiple powerful models like Joy_caption, MiniCPMv2_6, and Florence-2, enabling efficient batch processing and enhanced creative control for AI art generation.

How It Works

The project offers ComfyUI nodes that leverage state-of-the-art models for image analysis and text generation. It supports Joy_caption for detailed image descriptions, MiniCPMv2_6 for prompt generation, and Florence-2 for versatile captioning and prompt engineering. This modular approach allows users to combine different models for tailored workflows, aiming for faster processing and higher quality outputs compared to single-model solutions.

Quick Start & Requirements

  • Install dependencies: python -m pip install -r requirements.txt or run install_req.bat.
  • Ensure transformers library is up-to-date.
  • Models can be automatically downloaded by ComfyUI or manually placed in specified directories (e.g., models\Joy_caption_alpha, clip/siglip-so400m-patch14-384, LLM/Meta-Llama-3.1-8B-bnb-4bit).
  • Manual download is recommended for some models, with links provided in the README.
  • See ComfyUI for the base environment.

Highlighted Details

  • Supports batch folder tagging and batch image classification.
  • Claims processing speeds: Florence-2 < MiniCPMv2_6 < Joy_caption (4-5 seconds per image on a 4090).
  • Integrates MiniCPM3-4B for strong chat, translation, and rewriting capabilities.
  • Includes support for Florence-2-large-PromptGen-v1.5 and Florence-2-base-PromptGen-v1.5.

Maintenance & Community

  • Project activity and updates are indicated by recent date stamps in the README (e.g., 2024-10-30, 2024-10-16).
  • Model download links point to Hugging Face and Baidu Netdisk.

Licensing & Compatibility

  • The README does not explicitly state a license.
  • Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The project relies on external model downloads, some of which require manual intervention. Specific version requirements for dependencies like transformers are noted, and compatibility with different ComfyUI versions is not detailed.

Health Check
Last commit

5 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
35 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.