PaperBanana by dwzhu-pku

Automating academic illustration generation for AI scientists

Created 4 months ago

6,554 stars

Top 7.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Pawel Garbacki

Cofounder of Fireworks AI

Project Summary

PaperBanana automates the creation of academic illustrations for AI scientists, transforming raw scientific content into publication-quality diagrams and plots. It targets researchers seeking to accelerate visual communication and publication workflows by providing a sophisticated, multi-agent generation framework.

How It Works

The framework employs a reference-driven, multi-agent pipeline. Specialized agents (Retriever, Planner, Stylist, Visualizer, Critic) collaborate to generate illustrations. The Retriever identifies relevant examples, the Planner translates content into descriptions, the Stylist refines aesthetics, the Visualizer creates images, and the Critic iteratively refines outputs. This approach leverages in-context learning and iterative refinement for high-quality, semantically accurate, and aesthetically pleasing results.

Quick Start & Requirements

Installation involves cloning the repository, setting up Python 3.12 with uv, and installing dependencies via uv pip install -r requirements.txt. Configuration requires API keys for underlying models (e.g., Gemini) and optionally downloading the PaperBananaBench dataset. Users can launch an interactive Streamlit demo (streamlit run demo.py) or utilize the command-line interface (python main.py).

Highlighted Details

Multi-Agent Pipeline: Orchestrates specialized agents for a structured generation process.
Reference-Driven: Utilizes a curated collection of examples for guidance via generative retrieval.
Iterative Refinement: Employs a Critic-Visualizer loop for progressive quality enhancement.
Style-Aware Generation: Integrates automatically synthesized style guidelines for academic aesthetics.
Interactive Demo: Streamlit interface for easy generation, configuration, and refinement.
Parallel Processing: Supports generating up to 20 candidate diagrams concurrently.
High-Resolution Output: Enables upscaling to 2K/4K resolutions.
Extensible Design: Modular agent architecture allows for customization.

Maintenance & Community

The project is actively supported by community contributions, with several related forks and projects noted. It is explicitly stated that this is not an officially supported Google product and has no current plans for commercialization.

Licensing & Compatibility

PaperBanana is released under the Apache-2.0 license. However, Google has filed patents for the core workflows, which restricts third-party commercial applications utilizing similar logic, though open-source research efforts are unaffected.

Limitations & Caveats

The project acknowledges that further development is needed for more reliable generation and handling diverse, complex scenarios. The patent filings by Google impose restrictions on commercial use of similar logic by third parties. It is not an officially supported Google product.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

305 stars in the last 30 days