Discover and explore top open-source AI tools and projects—updated daily.
IuvenisSapiensMultimodal AI for ComfyUI
Top 93.8% on SourcePulse
This ComfyUI custom node integrates the Qwen3-VL-Instruct multimodal model, enabling users to generate captions and responses from diverse inputs including text, single images, multiple images, and video. It targets ComfyUI users seeking to leverage advanced visual-language understanding within their existing node-based workflows for tasks like image description, video analysis, and multi-image storytelling.
How It Works
The node acts as an interface to the Qwen3-VL-Instruct model, processing user-provided text prompts, single or multiple images, and video files. It analyzes these inputs to generate relevant textual outputs, such as detailed captions for images or videos, or narrative summaries that connect a series of images. The core advantage lies in bringing sophisticated multimodal AI capabilities directly into the ComfyUI ecosystem.
Quick Start & Requirements
ComfyUI/custom_nodes/ and execute pip install -r requirements.txt.ComfyUI_MiniCPM-V-4_5 repository.ComfyUI/models/prompt_generator/ upon first use if not present.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 week ago
Inactive
InternLM
OpenGVLab