ComfyUI-QwenVL  by 1038lab

Multimodal AI integration for ComfyUI

Created 8 months ago
377 stars

Top 75.4% on SourcePulse

GitHubView on GitHub
Project Summary

This ComfyUI custom node integrates Alibaba Cloud's Qwen-VL series of vision-language models, including Qwen3-VL and Qwen2.5-VL. It empowers users to perform advanced multimodal AI tasks such as text generation, image understanding, and video analysis directly within ComfyUI workflows, offering a flexible and powerful extension for AI-driven creative and analytical pipelines.

How It Works

The node seamlessly embeds Qwen-VL models into ComfyUI, enabling them to process both visual and textual inputs. It features automatic model downloading from Hugging Face and supports on-the-fly quantization (4-bit, 8-bit, FP16) to optimize VRAM usage and performance based on hardware capabilities. The integration allows for processing single images or video frame sequences, making it versatile for various multimodal applications.

Quick Start & Requirements

  • Installation: Clone the repository into your ComfyUI/custom_nodes directory and install dependencies via pip install -r requirements.txt.
  • Prerequisites: A functional ComfyUI installation. Models are downloaded automatically on first use.
  • Links: GitHub Repository

Highlighted Details

  • Offers both Standard and Advanced nodes for varying levels of control.
  • Supports a preset and custom prompt system for flexible input.
  • Features automatic model downloading and on-the-fly quantization (4-bit, 8-bit, FP16).
  • Includes hardware-aware safeguards, particularly for FP8 model compatibility.
  • Accepts image and video (frame sequence) inputs.
  • "Keep Model Loaded" option enhances performance for sequential runs.

Maintenance & Community

No specific details regarding maintainers, community channels (e.g., Discord/Slack), or roadmap beyond completed features are provided in the README.

Licensing & Compatibility

Released under the GPL-3.0 License. This copyleft license may impose restrictions on use in closed-source or commercial applications, requiring derivative works to also be open-sourced under GPL-3.0.

Limitations & Caveats

Support for GGUF format for broader CPU and hardware compatibility is listed as a future plan, indicating it is not currently available. The README does not detail other known limitations or alpha status.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
24
Star History
169 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.