ComfyUI-Gemini by ZHO-ZHO-ZHO

ComfyUI integration for Google's Gemini models

Created 2 years ago

781 stars

Top 44.9% on SourcePulse

Project Summary

This repository provides custom nodes for ComfyUI, enabling users to integrate Google's Gemini large language models. It targets AI artists, researchers, and developers using ComfyUI for generative tasks, offering enhanced prompt generation, image description, and conversational AI capabilities directly within their existing workflows.

How It Works

The nodes leverage the Gemini API to interact with three models: Gemini-pro (text), Gemini-pro-vision (text + image), and Gemini 1.5 Pro (text + image + file). It supports both implicit API key management (via environment variables for security) and explicit key input. Key features include multimodal input (images, URLs, and large files up to 20GB for Gemini 1.5 Pro), system instruction support, and conversational memory for chatbot functionalities.

Quick Start & Requirements

Install: Recommended via ComfyUI Manager. Manual install: cd custom_nodes && git clone https://github.com/ZHO-ZHO-ZHO/ComfyUI-Gemini.git && cd ComfyUI-Gemini && pip install -r requirements.txt.
Prerequisites: Python, ComfyUI. Google Gemini API Key required.
Dependencies: google-generativeai (version > 0.4.1 recommended for Gemini 1.5 Pro).
Links: Gemini API Application

Highlighted Details

Integrates Gemini 1.5 Pro with a 1 million token context window and multimodal file support (video, audio).
Offers nodes for batch image labeling using Gemini Pro Vision.
Includes a "DALL-E 3 alternative" workflow using Gemini 1.5 Pro with Stable Diffusion.
Supports system instructions and multi-turn conversations.

Maintenance & Community

Active development with recent updates (V3.0 adding Gemini 1.5 Pro).
Community support via QQ group (839821928).
Developer contact: zhozho3965@gmail.com.

Licensing & Compatibility

The repository itself does not explicitly state a license. ComfyUI is typically under an MIT license. Gemini API usage is subject to Google's terms of service.

Limitations & Caveats

Gemini API has rate limits (2 requests/minute, 1000 requests/day).
File upload currently supports single files; multi-file upload (for video) is pending.
Explicit API key usage in workflows poses a security risk if shared.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

4 stars in the last 30 days

Explore Similar Projects

Comfyui_Comfly by ainewsto

ComfyUI extension for AI image/video generation workflows

Created 1 year ago

Updated 3 weeks ago

ComfyUI-Gemini_Flash_2.0_Exp by ShmuelRonen

ComfyUI node for multimodal analysis using Gemini Flash 2.0 Experimental

Created 1 year ago

Updated 10 months ago

chatwise-releases by egoist

AI Chatbot for local LLM use

Created 1 year ago

Updated 1 day ago

client by google-gemini-php

PHP API client for interacting with the Gemini AI API

Created 2 years ago

Updated 1 month ago

comfyui-mixlab-nodes by MixLabPro

ComfyUI extension for workflow-to-app conversion and more

Created 2 years ago

Updated 7 months ago

Starred by

Jeffrey Morgan

Jeffrey Morgan(Cofounder of Ollama).

witsy by nbonamy

Desktop AI assistant for universal model control

Created 1 year ago

Updated 1 day ago

chat_gpt_sdk by redevrx

Flutter SDK for OpenAI APIs

Created 3 years ago

Updated 4 months ago

Starred by

Michael Han

Michael Han(Cofounder of Unsloth),

Tim Suchanek

Tim Suchanek(Founder of expand.ai), and

7 more.

cactus by cactus-compute

Framework for on-device AI, targeting mobile and wearables

Created 10 months ago

Updated 1 day ago

gemini-next-chat by u14app

Gemini chatbot web app with one-click deployment

Created 2 years ago

Updated 2 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Philipp Schmid

Philipp Schmid(DevRel at Google DeepMind), and

1 more.

python-genai by googleapis

Python SDK for Google GenAI enables integration of generative models

Created 1 year ago

Updated 1 day ago

chatgpt-web-midjourney-proxy by Dooy

One-UI for multimodal AI tasks

Created 2 years ago

Updated 3 weeks ago

kirara-ai by lss233

DIY chatbot for multiple platforms

Created 3 years ago

Updated 8 months ago

Feedback? Help us improve.