ComfyUI-Miaoshouai-Tagger  by miaoshouai

ComfyUI extension for enhanced image captioning via fine-tuned Florence-2 model

created 1 year ago
436 stars

Top 69.4% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides an advanced image captioning and tagging tool for ComfyUI, leveraging Microsoft's Florence-2 model fine-tuned on a curated dataset of Civitai images and tags. It aims to offer higher accuracy and relevance than existing taggers like WD14, specifically for Stable Diffusion workflows, by generating tags that better align with typical image generation prompts.

How It Works

The tool utilizes the Florence-2 model, fine-tuned for prompt generation, to produce descriptive captions and keywords for images. Its node-based architecture within ComfyUI allows for flexible integration and concatenation with other nodes, enabling complex image processing pipelines. This approach enhances image training data by providing more accurate and contextually relevant tags compared to general-purpose vision models.

Quick Start & Requirements

  • Install by cloning the repository into the ComfyUI/custom_nodes folder.
  • Install dependencies via pip install -r requirements.txt, requiring transformers version 3.8.0 or higher.
  • Models are automatically downloaded to ComfyUI/LLM on first use.
  • Official documentation and example workflows are available within the repository.

Highlighted Details

  • Fine-tuned on Civitai images and tags for improved prompt alignment.
  • Supports multiple Florence-2 model versions, including Florence-2-base-PromptGen-v1.5 and Florence-2-large-PromptGen-v1.5.
  • Includes a random prompt widget for varied output.
  • Offers a separate node for Flux CLIP text encoder integration.

Maintenance & Community

The project has seen recent updates (v1.4 as of Nov 2024) supporting new model versions and fixing configuration issues. Community support and further development details are not explicitly detailed in the README.

Licensing & Compatibility

The repository does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project's licensing is not clearly defined, which may impact commercial adoption. While it aims for high accuracy, the effectiveness of the fine-tuning on Civitai data for all use cases is not benchmarked.

Health Check
Last commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
3
Star History
47 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.