gemimg by minimaxir

Generate and edit images with Gemini API

Created 6 months ago

345 stars

Top 80.6% on SourcePulse

Project Summary

gemimg: Lightweight Gemini API Image Generation Wrapper

This Python package provides a lightweight interface to Google's Gemini API, specifically targeting the Nano Banana (Gemini 2.5 Flash Image) and Nano Banana Pro models. It empowers developers and power users with programmatic control over image generation and editing, offering an alternative to web-based interfaces by avoiding watermarks and enabling more complex inputs. The primary benefit is enhanced control and efficiency for advanced image manipulation tasks.

How It Works

gemimg acts as a thin wrapper around the Gemini API, eschewing Google's official Client SDK for minimal dependencies. It directly handles image input/output, encoding/decoding, and saving, abstracting away much of the complexity. The core approach leverages Gemini's advanced multimodal text encoder and long context window, allowing for highly nuanced prompt engineering, including detailed compositional requirements and multi-image compositing, leading to more accurate and controllable image generation.

Quick Start & Requirements

Install: pip3 install gemimg
Prerequisites: A Gemini API key is required. It can be provided via the GEMINI_API_KEY environment variable, a .env file, or directly in the code. Billing must be enabled on the associated GCP project.
Output: Generated images are returned as PIL.Image objects.
Links: Jupyter Notebooks demonstrating advanced use cases are mentioned but not directly linked.

Highlighted Details

Supports generating images in various aspect ratios with simple text prompts.
Enables complex image editing and compositing by accepting multiple input images.
Facilitates ControlNet-like image generation by using an input image for pose or structural guidance.
Offers a convenient Command-Line Interface for direct image generation without Python scripting.
Markdown formatting within prompts significantly enhances control over subject details and composition.

Maintenance & Community

Maintainer: Max Woolf (@minimaxir).
Support: The project is supported via Patreon and GitHub Sponsors.

Licensing & Compatibility

License: MIT.
Compatibility: The MIT license permits commercial use and integration into closed-source projects.

Limitations & Caveats

The underlying Gemini 2.5 Flash Image model does not support direct style transfer. Free-form text generation within images is unreliable, with a recommended workaround involving compositing rendered text as an input image. System prompts are not functional despite API schema indications. By default, input images are resized to a maximum dimension of 1024px to ensure efficient processing, though this behavior can be disabled. The package intentionally omits support for multi-turn conversations and text output to maintain its lightweight design.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days