Nano-PDF by gavrielc

AI-powered CLI for PDF editing and slide generation

Created 2 months ago

1,095 stars

Top 34.6% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

This project provides a command-line interface (CLI) tool for editing PDF documents, specifically presentations, using natural language prompts powered by Google's Gemini 3 Pro Image model. It targets users who need to make quick, content-based or stylistic modifications to PDFs without resorting to traditional graphical editors, offering a significant time-saving benefit through AI automation.

How It Works

Nano-PDF employs a multi-stage process: first, it renders target PDF pages into images using the Poppler library. These images, optionally accompanied by style reference pages, are then sent along with natural language editing prompts to Google's Gemini 3 Pro Image model. The AI generates an edited image, which is subsequently processed by Tesseract OCR to re-hydrate a searchable text layer. Finally, the tool stitches these AI-edited images back into the original PDF structure, preserving the document's integrity and text selectability. This approach allows for non-destructive, context-aware edits and parallel processing across multiple pages for efficiency.

Quick Start & Requirements

Primary install: pip install nano-pdf
Prerequisites:
- Python 3.10+
- A paid Google Gemini API key with billing enabled (free tier keys are insufficient for image generation).
- GEMINI_API_KEY environment variable must be set.
- System dependencies: poppler (for PDF rendering) and tesseract (for OCR). Installation instructions are provided for macOS, Windows, and Linux (Ubuntu/Debian).
Links: Google AI Studio (for API key).

Highlighted Details

Natural Language Editing: Modify content, update text, change charts, or fix typos using descriptive prompts.
Add New Slides: Generate entirely new slides that automatically match the visual style of the existing deck.
Non-Destructive Editing: Preserves the original searchable text layer through OCR re-hydration.
Multi-page & Parallel Processing: Edit multiple PDF pages concurrently for faster workflows.
Style Referencing: Option to specify reference pages (--style-refs) to guide the AI in matching fonts, colors, and layout.
Configurable Resolution: Control image quality and processing speed via --resolution (4K, 2K, 1K).

Maintenance & Community

No specific details regarding maintainers, community channels (like Discord or Slack), or project roadmap were found in the provided README excerpt.

Licensing & Compatibility

License: MIT.
Compatibility: The MIT license is permissive and generally allows for commercial use and integration into closed-source projects.

Limitations & Caveats

Operation requires a paid Google Gemini API tier, and free tier keys will not function. The accuracy of the OCR re-hydration can vary, particularly with highly stylized fonts or very small text. Processing speed is directly influenced by the chosen image resolution, with higher resolutions leading to slower execution. The tool's effectiveness is also dependent on the correct installation and accessibility of system dependencies like Poppler and Tesseract.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

135 stars in the last 30 days