gemimg  by minimaxir

Generate and edit images with Gemini API

Created 4 months ago
333 stars

Top 82.5% on SourcePulse

GitHubView on GitHub
Project Summary

gemimg: Lightweight Gemini API Image Generation Wrapper

This Python package provides a lightweight interface to Google's Gemini API, specifically targeting the Nano Banana (Gemini 2.5 Flash Image) and Nano Banana Pro models. It empowers developers and power users with programmatic control over image generation and editing, offering an alternative to web-based interfaces by avoiding watermarks and enabling more complex inputs. The primary benefit is enhanced control and efficiency for advanced image manipulation tasks.

How It Works

gemimg acts as a thin wrapper around the Gemini API, eschewing Google's official Client SDK for minimal dependencies. It directly handles image input/output, encoding/decoding, and saving, abstracting away much of the complexity. The core approach leverages Gemini's advanced multimodal text encoder and long context window, allowing for highly nuanced prompt engineering, including detailed compositional requirements and multi-image compositing, leading to more accurate and controllable image generation.

Quick Start & Requirements

  • Install: pip3 install gemimg
  • Prerequisites: A Gemini API key is required. It can be provided via the GEMINI_API_KEY environment variable, a .env file, or directly in the code. Billing must be enabled on the associated GCP project.
  • Output: Generated images are returned as PIL.Image objects.
  • Links: Jupyter Notebooks demonstrating advanced use cases are mentioned but not directly linked.

Highlighted Details

  • Supports generating images in various aspect ratios with simple text prompts.
  • Enables complex image editing and compositing by accepting multiple input images.
  • Facilitates ControlNet-like image generation by using an input image for pose or structural guidance.
  • Offers a convenient Command-Line Interface for direct image generation without Python scripting.
  • Markdown formatting within prompts significantly enhances control over subject details and composition.

Maintenance & Community

  • Maintainer: Max Woolf (@minimaxir).
  • Support: The project is supported via Patreon and GitHub Sponsors.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: The MIT license permits commercial use and integration into closed-source projects.

Limitations & Caveats

The underlying Gemini 2.5 Flash Image model does not support direct style transfer. Free-form text generation within images is unreliable, with a recommended workaround involving compositing rendered text as an input image. System prompts are not functional despite API schema indications. By default, input images are resized to a maximum dimension of 1024px to ensure efficient processing, though this behavior can be disabled. The package intentionally omits support for multi-turn conversations and text output to maintain its lightweight design.

Health Check
Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 30 days

Explore Similar Projects

Starred by Peter Norvig Peter Norvig(Author of "Artificial Intelligence: A Modern Approach"; Research Director at Google).

NanoBananaEditor by markfulton

1.8%
570
Advanced AI image generation and editing platform
Created 4 months ago
Updated 3 months ago
Feedback? Help us improve.