nanobanana-mcp-server by zhongweili

AI image generation server with intelligent model selection

Created 10 months ago

374 stars

Top 75.6% on SourcePulse

Project Summary

This project provides a production-ready Model Context Protocol (MCP) server for AI-powered image generation, leveraging Google's Gemini models. It targets developers and power users seeking to integrate advanced image synthesis capabilities into their workflows, offering intelligent model selection, high-resolution output, and features like real-world grounding for enhanced accuracy and consistency.

How It Works

The server acts as an MCP endpoint, routing requests to Google's Gemini models. It features "Smart Model Selection," automatically directing prompts to the Gemini 3.1 Flash Image (NB2) for balanced speed and quality, or the Gemini 3 Pro Image for complex reasoning tasks. This approach optimizes performance and leverages advanced capabilities such as Google Search Grounding for factual accuracy, subject consistency across multiple elements, and precise text rendering within generated images.

Quick Start & Requirements

Installation: Recommended via MCP Registry (io.github.zhongweili/nanobanana-mcp-server), uvx nanobanana-mcp-server@latest, or pip install nanobanana-mcp-server.
Prerequisites: A Google Gemini API Key (obtainable from Google AI Studio) or Google Cloud Application Default Credentials (ADC) for Vertex AI authentication. Python 3.11+ is required for development.
Configuration: Authentication is managed via NANOBANANA_AUTH_METHOD (API Key, Vertex AI, or auto). API Key authentication requires setting the GEMINI_API_KEY environment variable. Vertex AI requires NANOBANANA_AUTH_METHOD=vertex_ai, GCP_PROJECT_ID, and GCP_REGION.
Documentation: Google AI Studio for API keys.

Highlighted Details

Multi-Model Support: Integrates Gemini 3.1 Flash (NB2 - default, 4K, fast), Gemini 3 Pro (max reasoning), and Gemini 2.5 Flash (legacy, rapid prototyping).
Intelligent Model Routing: Automatically selects NB2 or Pro based on prompt complexity, quality keywords, and requested thinking levels.
Advanced Features: Includes Google Search Grounding, subject consistency (up to 5 characters, 14 objects), precision text rendering, and aspect ratio control (1:1, 16:9, 9:16, 21:9, etc.).
Output Management: Supports specifying exact file paths or directories for generated images via the output_path parameter.

Maintenance & Community

Support and discussions are primarily handled through GitHub Issues and Discussions for the repository.

Licensing & Compatibility

The project is released under the MIT License, permitting commercial use and integration into closed-source applications, provided the terms of the license are met.

Limitations & Caveats

Authentication requires either a Google Gemini API key or Google Cloud Vertex AI setup. The legacy "Flash" model is limited to 1024px resolution. Local development requires cloning the repository and managing Python dependencies.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

12 stars in the last 30 days