ComfyUI-Florence2 by kijai

ComfyUI nodes for Florence2 vision-language model inference

Created 1 year ago

1,562 stars

Top 26.5% on SourcePulse

Project Summary

This repository provides a ComfyUI custom node for running Microsoft's Florence-2 Vision Language Model (VLM). It enables users to perform various vision tasks, including object detection, segmentation, captioning, and notably, Document Visual Question Answering (DocVQA), by leveraging Florence-2's prompt-based approach and sequence-to-sequence architecture.

How It Works

Florence-2 is a powerful VLM trained on the extensive FLD-5B dataset, allowing it to handle diverse vision-language tasks through simple text prompts. This node integrates Florence-2 into the ComfyUI workflow, facilitating tasks like DocVQA by allowing users to ask questions about document images and receive answers derived from the visual and textual content.

Quick Start & Requirements

Install by cloning the repository into the ComfyUI/custom_nodes directory.
Install dependencies: pip install -r requirements.txt (requires transformers>=4.38.0).
Models are automatically downloaded to ComfyUI/models/LLM.
Supports various Florence-2 models from Hugging Face, including DocVQA variants.
Official models: microsoft/Florence-2-base, microsoft/Florence-2-large, HuggingFaceM4/Florence-2-DocVQA.

Highlighted Details

Adds Document Visual Question Answering (DocVQA) capability to ComfyUI.
Supports a wide range of Florence-2 models and tested finetunes.
Enables prompt-based vision tasks like captioning, object detection, and segmentation.
Integrates seamlessly as a ComfyUI custom node.

Maintenance & Community

No specific community links or maintenance details are provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not mentioned.

Limitations & Caveats

Accuracy for DocVQA is dependent on input image quality and question complexity. The README does not mention specific hardware requirements beyond standard ComfyUI dependencies.

ComfyUI-Florence2 by kijai

Explore Similar Projects

lens by ContextualAI

BLIVA by mlpc-ucsd

ComfyUI-Miaoshouai-Tagger by miaoshouai

Awesome-Prompting-on-Vision-Language-Model by JindongGu

Vary-toy by Ucas-HaoranWei

VLP by LuoweiZhou

Oscar by microsoft

smollm by huggingface

Qwen-VL by QwenLM

dots.ocr by rednote-hilab

LAVIS by salesforce

DeepSeek-VL2 by deepseek-ai