ComfyUI_SLK_joy_caption_two by EvilBT

ComfyUI node for image captioning

Created 1 year ago

711 stars

Top 48.2% on SourcePulse

Project Summary

This repository provides a ComfyUI node for advanced image captioning using the JoyCaptionAlpha Two model. It's designed for users involved in AI image generation and training, offering enhanced control over caption generation for batch processing and fine-tuning.

How It Works

The node integrates the JoyCaptionAlpha Two model into the ComfyUI workflow, enabling users to generate detailed captions for images. It supports advanced batch processing features like adding custom prefixes and suffixes to captions, facilitating organized dataset preparation for model training. The implementation allows for fine-tuning caption generation parameters such as top_p and temperature.

Quick Start & Requirements

Installation: Install via Comfy Manager by searching for "JoyCaptionAlpha Two for ComfyUI", or manually clone the repository into custom_nodes and install dependencies with pip install -r ComfyUI_SLK_joy_caption_two/requirements.txt.
Prerequisites: Requires ComfyUI. Manual model downloads are necessary for google/siglip-so400m-patch14-384 and the Joy-Caption-alpha-two model, which should be placed in specific subdirectories within ComfyUI's models folder. Llama 3.1 8B models can be automatically downloaded or manually placed.
Environment: Tested in an 8GB VRAM environment.
Documentation: Example workflows are available in the examples/workflows.png file.

Highlighted Details

Advanced batch captioning with prefix/suffix options for training data.
Support for configurable top_p and temperature parameters.
Compatibility with different Llama 3.1 8B model versions (e.g., unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit).
Option to integrate with AIGODLIKE-ComfyUI-Translation for Chinese language support.

Maintenance & Community

The project is actively maintained with recent updates addressing bugs and adding features. Users can report issues via GitHub issues.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The node is noted as "not fully tested" and users are encouraged to report issues. The README mentions it was tested in an 8GB VRAM environment, suggesting potential VRAM requirements for optimal performance.

ComfyUI_SLK_joy_caption_two by EvilBT

Explore Similar Projects

Comfyui_CXH_joy_caption by StartHua

fromage by kohjingyu

lp-music-caps by seungheondoh

Comfyui_image2prompt by zhongpei

ClipCap-Chinese by yangjianxin1

VisualGPT by Vision-CAIR

VLP by LuoweiZhou

Mengzi by Langboat

joycaption by fpgaminer

shortrocity by unconv

Caption-Anything by ttengwang

CLIP_prefix_caption by rmokady