paperbanana by llmsresearch

Agentic framework for automated academic illustration

Created 1 week ago

New!

510 stars

Top 61.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Pawel Garbacki

Cofounder of Fireworks AI

Project Summary

This project provides an open-source implementation and extension of Google Research's PaperBanana, designed to automate the creation of academic figures, diagrams, and statistical plots from text descriptions. It targets AI scientists and researchers, offering a benefit of generating publication-quality visuals efficiently through an agentic framework powered by Google Gemini.

How It Works

PaperBanana employs a two-phase, multi-agent pipeline featuring five specialized agents: Retriever, Planner, Stylist, Visualizer, and Critic. Phase 1 involves the Retriever selecting relevant examples, the Planner generating a detailed textual description, and the Stylist refining it for visual aesthetics based on NeurIPS-style guidelines. Phase 2 focuses on iterative refinement, where the Visualizer renders an image using Gemini 3 Pro for diagrams or Matplotlib for plots, and the Critic evaluates the output, providing feedback for revised descriptions. This refinement loop repeats up to three times, leveraging Gemini's VLM capabilities for planning and critique.

Quick Start & Requirements

Installation: Install via pip: pip install paperbanana. For development, clone the repository and install with pip install -e ".[dev,google]".
Prerequisites: Python 3.10+ and a Google Gemini API key obtained from Google AI Studio.
Setup: Run paperbanana setup for an interactive wizard to configure the API key, or manually edit the .env file.
Usage: Generate diagrams with paperbanana generate --input <text_file> --caption "<description>". Generate plots with paperbanana plot --data <csv_file> --intent "<description>". Evaluate diagrams with paperbanana evaluate --generated <img1> --reference <img2> --context <text_file> --caption "<description>".
Links: Original Paper, Google AI Studio.

Highlighted Details

Agentic framework with specialized roles for planning, visualization, and critique.
Two-phase pipeline: Linear Planning and Iterative Refinement.
Leverages Google Gemini for VLM (planning, critique) and image generation.
Provides CLI, Python API, and an MCP server for IDE integration (e.g., Claude Code, Cursor).
Includes a VLM-as-a-Judge evaluation component for assessing diagram quality.
Supports generation of both methodology diagrams and statistical plots.

Maintenance & Community

This is an unofficial, community-driven open-source implementation. Specific details on active maintainers, sponsorships, or dedicated community channels (like Discord/Slack) are not explicitly detailed in the README beyond the GitHub repository itself.

Licensing & Compatibility

The project is released under the MIT License. This permissive license allows for commercial use and integration into closed-source projects without significant restrictions.

Limitations & Caveats

This project is an unofficial reimplementation based on a research paper and is not affiliated with or endorsed by the original authors or Google Research. The implementation may differ from the original system described in the paper. Users should exercise discretion.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

523 stars in the last 10 days