ComfyUI-Gemini_Flash_2.0_Exp  by ShmuelRonen

ComfyUI node for multimodal analysis using Gemini Flash 2.0 Experimental

created 7 months ago
310 stars

Top 87.8% on sourcepulse

GitHubView on GitHub
Project Summary

This ComfyUI custom node integrates Google's Gemini Flash 2.0 Experimental model, offering multimodal analysis of text, images, video frames, and audio, plus image generation. It targets ComfyUI users seeking to leverage advanced AI capabilities within their visual workflows, providing a powerful tool for content analysis and creative generation.

How It Works

The node leverages Google's Gemini API to process various input types. It supports text, image, video frame sequences, and audio inputs, enabling multimodal analysis. For image generation, it utilizes a specific experimental model. Key features include chat mode with conversation history, structured output options, configurable API settings, and proxy support, allowing for flexible and context-aware interactions.

Quick Start & Requirements

  • Install via ComfyUI manager or clone the repository into ComfyUI/custom_nodes.
  • Install dependencies: pip install google-genai google-generativeai pillow torchaudio.
  • For Ubuntu/Debian: sudo apt-get install libportaudio2.
  • Requires a Google AI Studio API key.
  • Configuration is done via config.json or directly in the node's GUI.
  • Official Docs: Google AI Studio

Highlighted Details

  • Supports multimodal input: text, image, video frames, and audio.
  • Includes experimental image generation capabilities.
  • Features a "Smart Audio Recorder" node with silence detection for audio input.
  • Offers chat mode with conversation history and configurable parameters like temperature and token limits.

Maintenance & Community

  • The project is open for contributions via issues, forks, and pull requests.
  • No specific community links (Discord/Slack) or major contributor information are provided in the README.

Licensing & Compatibility

  • MIT License.
  • Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

This node utilizes an experimental Gemini model, meaning features and capabilities are subject to change. Troubleshooting notes indicate potential cross-platform issues, particularly with API key handling on Ubuntu/WSL, suggesting direct GUI input is more reliable in those environments.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
23 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.