ComfyUI-QwenVL by 1038lab

Multimodal AI integration for ComfyUI

Created 11 months ago

686 stars

Top 49.6% on SourcePulse

Project Summary

This ComfyUI custom node integrates Alibaba Cloud's Qwen-VL series of vision-language models, including Qwen3-VL and Qwen2.5-VL. It empowers users to perform advanced multimodal AI tasks such as text generation, image understanding, and video analysis directly within ComfyUI workflows, offering a flexible and powerful extension for AI-driven creative and analytical pipelines.

How It Works

The node seamlessly embeds Qwen-VL models into ComfyUI, enabling them to process both visual and textual inputs. It features automatic model downloading from Hugging Face and supports on-the-fly quantization (4-bit, 8-bit, FP16) to optimize VRAM usage and performance based on hardware capabilities. The integration allows for processing single images or video frame sequences, making it versatile for various multimodal applications.

Quick Start & Requirements

Installation: Clone the repository into your ComfyUI/custom_nodes directory and install dependencies via pip install -r requirements.txt.
Prerequisites: A functional ComfyUI installation. Models are downloaded automatically on first use.
Links: GitHub Repository

Highlighted Details

Offers both Standard and Advanced nodes for varying levels of control.
Supports a preset and custom prompt system for flexible input.
Features automatic model downloading and on-the-fly quantization (4-bit, 8-bit, FP16).
Includes hardware-aware safeguards, particularly for FP8 model compatibility.
Accepts image and video (frame sequence) inputs.
"Keep Model Loaded" option enhances performance for sequential runs.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord/Slack), or roadmap beyond completed features are provided in the README.

Licensing & Compatibility

Released under the GPL-3.0 License. This copyleft license may impose restrictions on use in closed-source or commercial applications, requiring derivative works to also be open-sourced under GPL-3.0.

Limitations & Caveats

Support for GGUF format for broader CPU and hardware compatibility is listed as a future plan, indicating it is not currently available. The README does not detail other known limitations or alpha status.

ComfyUI-QwenVL by 1038lab

Explore Similar Projects

tiny-qwen by Emericen

Comfyui-zhenzhen by T8mars

ComfyUI-DyPE by wildminder

jimeng-free-api-all by zhizinan1997

Lumina-mGPT-2.0 by Alpha-VLLM

ComfyUI_Qwen3-VL-Instruct by IuvenisSapiens

sophon-demo by sophgo

CogVLM2 by zai-org

InternLM-XComposer by InternLM

vllm-omni by vllm-project

sdnext by vladmandic

ComfyUI-WanVideoWrapper by kijai