Comfyui_CXH_joy_caption  by StartHua

ComfyUI extension for image captioning and tagging workflows

Created 1 year ago
603 stars

Top 54.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides ComfyUI nodes for advanced image captioning and prompt generation, targeting users of Stable Diffusion and similar generative AI models. It integrates multiple powerful models like Joy_caption, MiniCPMv2_6, and Florence-2, enabling efficient batch processing and enhanced creative control for AI art generation.

How It Works

The project offers ComfyUI nodes that leverage state-of-the-art models for image analysis and text generation. It supports Joy_caption for detailed image descriptions, MiniCPMv2_6 for prompt generation, and Florence-2 for versatile captioning and prompt engineering. This modular approach allows users to combine different models for tailored workflows, aiming for faster processing and higher quality outputs compared to single-model solutions.

Quick Start & Requirements

  • Install dependencies: python -m pip install -r requirements.txt or run install_req.bat.
  • Ensure transformers library is up-to-date.
  • Models can be automatically downloaded by ComfyUI or manually placed in specified directories (e.g., models\Joy_caption_alpha, clip/siglip-so400m-patch14-384, LLM/Meta-Llama-3.1-8B-bnb-4bit).
  • Manual download is recommended for some models, with links provided in the README.
  • See ComfyUI for the base environment.

Highlighted Details

  • Supports batch folder tagging and batch image classification.
  • Claims processing speeds: Florence-2 < MiniCPMv2_6 < Joy_caption (4-5 seconds per image on a 4090).
  • Integrates MiniCPM3-4B for strong chat, translation, and rewriting capabilities.
  • Includes support for Florence-2-large-PromptGen-v1.5 and Florence-2-base-PromptGen-v1.5.

Maintenance & Community

  • Project activity and updates are indicated by recent date stamps in the README (e.g., 2024-10-30, 2024-10-16).
  • Model download links point to Hugging Face and Baidu Netdisk.

Licensing & Compatibility

  • The README does not explicitly state a license.
  • Compatibility with commercial or closed-source projects is not specified.

Limitations & Caveats

The project relies on external model downloads, some of which require manual intervention. Specific version requirements for dependencies like transformers are noted, and compatibility with different ComfyUI versions is not detailed.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
373
Multimodal framework for vision-and-language transformer research
Created 3 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), and
1 more.

CLIP_prefix_caption by rmokady

0.1%
1k
Image captioning model using CLIP embeddings as a prefix
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.