ComfyUI-Gemini  by ZHO-ZHO-ZHO

ComfyUI integration for Google's Gemini models

created 1 year ago
767 stars

Top 46.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides custom nodes for ComfyUI, enabling users to integrate Google's Gemini large language models. It targets AI artists, researchers, and developers using ComfyUI for generative tasks, offering enhanced prompt generation, image description, and conversational AI capabilities directly within their existing workflows.

How It Works

The nodes leverage the Gemini API to interact with three models: Gemini-pro (text), Gemini-pro-vision (text + image), and Gemini 1.5 Pro (text + image + file). It supports both implicit API key management (via environment variables for security) and explicit key input. Key features include multimodal input (images, URLs, and large files up to 20GB for Gemini 1.5 Pro), system instruction support, and conversational memory for chatbot functionalities.

Quick Start & Requirements

  • Install: Recommended via ComfyUI Manager. Manual install: cd custom_nodes && git clone https://github.com/ZHO-ZHO-ZHO/ComfyUI-Gemini.git && cd ComfyUI-Gemini && pip install -r requirements.txt.
  • Prerequisites: Python, ComfyUI. Google Gemini API Key required.
  • Dependencies: google-generativeai (version > 0.4.1 recommended for Gemini 1.5 Pro).
  • Links: Gemini API Application

Highlighted Details

  • Integrates Gemini 1.5 Pro with a 1 million token context window and multimodal file support (video, audio).
  • Offers nodes for batch image labeling using Gemini Pro Vision.
  • Includes a "DALL-E 3 alternative" workflow using Gemini 1.5 Pro with Stable Diffusion.
  • Supports system instructions and multi-turn conversations.

Maintenance & Community

  • Active development with recent updates (V3.0 adding Gemini 1.5 Pro).
  • Community support via QQ group (839821928).
  • Developer contact: zhozho3965@gmail.com.

Licensing & Compatibility

  • The repository itself does not explicitly state a license. ComfyUI is typically under an MIT license. Gemini API usage is subject to Google's terms of service.

Limitations & Caveats

  • Gemini API has rate limits (2 requests/minute, 1000 requests/day).
  • File upload currently supports single files; multi-file upload (for video) is pending.
  • Explicit API key usage in workflows poses a security risk if shared.
Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
17 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.