ComfyUI_Qwen3-VL-Instruct  by IuvenisSapiens

Multimodal AI for ComfyUI

Created 1 year ago
276 stars

Top 93.8% on SourcePulse

GitHubView on GitHub
Project Summary

This ComfyUI custom node integrates the Qwen3-VL-Instruct multimodal model, enabling users to generate captions and responses from diverse inputs including text, single images, multiple images, and video. It targets ComfyUI users seeking to leverage advanced visual-language understanding within their existing node-based workflows for tasks like image description, video analysis, and multi-image storytelling.

How It Works

The node acts as an interface to the Qwen3-VL-Instruct model, processing user-provided text prompts, single or multiple images, and video files. It analyzes these inputs to generate relevant textual outputs, such as detailed captions for images or videos, or narrative summaries that connect a series of images. The core advantage lies in bringing sophisticated multimodal AI capabilities directly into the ComfyUI ecosystem.

Quick Start & Requirements

  • Installation: Install via ComfyUI Manager (search for "Qwen3") or clone the repository into ComfyUI/custom_nodes/ and execute pip install -r requirements.txt.
  • Prerequisites: A "Display Text node" is mandatory. If absent, it must be installed from the ComfyUI_MiniCPM-V-4_5 repository.
  • Models: Models are automatically downloaded to ComfyUI/models/prompt_generator/ upon first use if not present.

Highlighted Details

  • Comprehensive multimodal support: text, single image, multiple images, and video queries.
  • Generates descriptive captions and narrative responses tailored to input types.
  • Designed for seamless integration within the ComfyUI graphical interface.

Maintenance & Community

  • The provided README does not contain information regarding maintainers, community channels (e.g., Discord, Slack), or a public roadmap.

Licensing & Compatibility

  • The README does not specify the software license or any compatibility notes for commercial or closed-source use.

Limitations & Caveats

  • Setup requires ensuring the "Display Text node" is available, potentially necessitating an additional installation.
  • The README lacks details on performance metrics, specific hardware requirements (beyond general ComfyUI needs), or known bugs.
Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
4
Issues (30d)
17
Star History
159 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Elvis Saravia Elvis Saravia(Founder of DAIR.AI), and
1 more.

InternGPT by OpenGVLab

0.0%
3k
Interactive demo platform for showcasing AI models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.