ComfyUI_SLK_joy_caption_two  by EvilBT

ComfyUI node for image captioning

created 9 months ago
597 stars

Top 55.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a ComfyUI node for advanced image captioning using the JoyCaptionAlpha Two model. It's designed for users involved in AI image generation and training, offering enhanced control over caption generation for batch processing and fine-tuning.

How It Works

The node integrates the JoyCaptionAlpha Two model into the ComfyUI workflow, enabling users to generate detailed captions for images. It supports advanced batch processing features like adding custom prefixes and suffixes to captions, facilitating organized dataset preparation for model training. The implementation allows for fine-tuning caption generation parameters such as top_p and temperature.

Quick Start & Requirements

  • Installation: Install via Comfy Manager by searching for "JoyCaptionAlpha Two for ComfyUI", or manually clone the repository into custom_nodes and install dependencies with pip install -r ComfyUI_SLK_joy_caption_two/requirements.txt.
  • Prerequisites: Requires ComfyUI. Manual model downloads are necessary for google/siglip-so400m-patch14-384 and the Joy-Caption-alpha-two model, which should be placed in specific subdirectories within ComfyUI's models folder. Llama 3.1 8B models can be automatically downloaded or manually placed.
  • Environment: Tested in an 8GB VRAM environment.
  • Documentation: Example workflows are available in the examples/workflows.png file.

Highlighted Details

  • Advanced batch captioning with prefix/suffix options for training data.
  • Support for configurable top_p and temperature parameters.
  • Compatibility with different Llama 3.1 8B model versions (e.g., unsloth/Meta-Llama-3.1-8B-Instruct-bnb-4bit).
  • Option to integrate with AIGODLIKE-ComfyUI-Translation for Chinese language support.

Maintenance & Community

The project is actively maintained with recent updates addressing bugs and adding features. Users can report issues via GitHub issues.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The node is noted as "not fully tested" and users are encouraged to report issues. The README mentions it was tested in an 8GB VRAM environment, suggesting potential VRAM requirements for optimal performance.

Health Check
Last commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
8
Star History
85 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.