ComfyUI-Gemini_Flash_2.0_Exp by ShmuelRonen

ComfyUI node for multimodal analysis using Gemini Flash 2.0 Experimental

Created 1 year ago

334 stars

Top 82.2% on SourcePulse

Project Summary

This ComfyUI custom node integrates Google's Gemini Flash 2.0 Experimental model, offering multimodal analysis of text, images, video frames, and audio, plus image generation. It targets ComfyUI users seeking to leverage advanced AI capabilities within their visual workflows, providing a powerful tool for content analysis and creative generation.

How It Works

The node leverages Google's Gemini API to process various input types. It supports text, image, video frame sequences, and audio inputs, enabling multimodal analysis. For image generation, it utilizes a specific experimental model. Key features include chat mode with conversation history, structured output options, configurable API settings, and proxy support, allowing for flexible and context-aware interactions.

Quick Start & Requirements

Install via ComfyUI manager or clone the repository into ComfyUI/custom_nodes.
Install dependencies: pip install google-genai google-generativeai pillow torchaudio.
For Ubuntu/Debian: sudo apt-get install libportaudio2.
Requires a Google AI Studio API key.
Configuration is done via config.json or directly in the node's GUI.
Official Docs: Google AI Studio

Highlighted Details

Supports multimodal input: text, image, video frames, and audio.
Includes experimental image generation capabilities.
Features a "Smart Audio Recorder" node with silence detection for audio input.
Offers chat mode with conversation history and configurable parameters like temperature and token limits.

Maintenance & Community

The project is open for contributions via issues, forks, and pull requests.
No specific community links (Discord/Slack) or major contributor information are provided in the README.

Licensing & Compatibility

MIT License.
Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

This node utilizes an experimental Gemini model, meaning features and capabilities are subject to change. Troubleshooting notes indicate potential cross-platform issues, particularly with API key handling on Ubuntu/WSL, suggesting direct GUI input is more reliable in those environments.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days