ComfyUI extension for enhanced image captioning via fine-tuned Florence-2 model
Top 69.4% on sourcepulse
This project provides an advanced image captioning and tagging tool for ComfyUI, leveraging Microsoft's Florence-2 model fine-tuned on a curated dataset of Civitai images and tags. It aims to offer higher accuracy and relevance than existing taggers like WD14, specifically for Stable Diffusion workflows, by generating tags that better align with typical image generation prompts.
How It Works
The tool utilizes the Florence-2 model, fine-tuned for prompt generation, to produce descriptive captions and keywords for images. Its node-based architecture within ComfyUI allows for flexible integration and concatenation with other nodes, enabling complex image processing pipelines. This approach enhances image training data by providing more accurate and contextually relevant tags compared to general-purpose vision models.
Quick Start & Requirements
ComfyUI/custom_nodes
folder.pip install -r requirements.txt
, requiring transformers
version 3.8.0 or higher.ComfyUI/LLM
on first use.Highlighted Details
Florence-2-base-PromptGen-v1.5
and Florence-2-large-PromptGen-v1.5
.Maintenance & Community
The project has seen recent updates (v1.4 as of Nov 2024) supporting new model versions and fixing configuration issues. Community support and further development details are not explicitly detailed in the README.
Licensing & Compatibility
The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.
Limitations & Caveats
The project's licensing is not clearly defined, which may impact commercial adoption. While it aims for high accuracy, the effectiveness of the fine-tuning on Civitai data for all use cases is not benchmarked.
3 months ago
1 week