ComfyUI node for multimodal analysis using Gemini Flash 2.0 Experimental
Top 87.8% on sourcepulse
This ComfyUI custom node integrates Google's Gemini Flash 2.0 Experimental model, offering multimodal analysis of text, images, video frames, and audio, plus image generation. It targets ComfyUI users seeking to leverage advanced AI capabilities within their visual workflows, providing a powerful tool for content analysis and creative generation.
How It Works
The node leverages Google's Gemini API to process various input types. It supports text, image, video frame sequences, and audio inputs, enabling multimodal analysis. For image generation, it utilizes a specific experimental model. Key features include chat mode with conversation history, structured output options, configurable API settings, and proxy support, allowing for flexible and context-aware interactions.
Quick Start & Requirements
ComfyUI/custom_nodes
.pip install google-genai google-generativeai pillow torchaudio
.sudo apt-get install libportaudio2
.config.json
or directly in the node's GUI.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
This node utilizes an experimental Gemini model, meaning features and capabilities are subject to change. Troubleshooting notes indicate potential cross-platform issues, particularly with API key handling on Ubuntu/WSL, suggesting direct GUI input is more reliable in those environments.
3 months ago
1 day