paperify by jstrieb

CLI tool to transform documents into research papers

Created 2 years ago

374 stars

Top 76.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Elvis Saravia

Founder of DAIR.AI

Project Summary

Paperify transforms any document, webpage, or ebook into a realistic-looking research paper by interspersing figures and equations from real academic papers. It targets users who need to quickly generate plausible-looking academic documents for presentations, mockups, or creative purposes, offering a novel way to add visual academic elements to any text.

How It Works

The script leverages a Unix pipeline approach, primarily using Bash, to process input documents. It converts the input to Markdown via Pandoc, then fetches figures and LaTeX-formatted equations from a specified number of arXiv papers. These elements are randomly interspersed into the Markdown content based on configurable frequencies. Optionally, it can use the OpenAI API to generate a title, abstract, and metadata, and compiles the final output using the IEEE LaTeX template.

Quick Start & Requirements

Install dependencies: curl, Python 3, pandoc, jq, texlive (with texlive-publishers, texlive-science, lmodern, texlive-latex-extra), and optionally imagemagick.
Install script: curl -L https://github.com/jstrieb/paperify/raw/master/paperify.sh | sudo tee /usr/local/bin/paperify && sudo chmod +x /usr/local/bin/paperify
Example: paperify "input.txt" "output.pdf"
Docker image available: jstrieb/paperify
Official quick-start and examples are provided in the README.

Highlighted Details

Generates realistic academic papers from arbitrary text sources.
Intersperse figures and equations from arXiv papers with configurable frequency.
Optional ChatGPT integration for title, abstract, and metadata generation.
Compiles output using the IEEE LaTeX template for authenticity.

Maintenance & Community

The project is marked as complete by the author, with no further development planned beyond addressing issues and potentially merging pull requests. The author expresses hope for long-term compatibility.

Licensing & Compatibility

The README does not explicitly state a license. The project relies on external tools like Pandoc and LaTeX, which have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The script uses Markdown as an intermediate format, which can lead to loss of original styling and information. Non-ASCII Unicode characters are stripped before LaTeX compilation. Image filtering is heuristic-based and may result in false positives/negatives. Some web pages with query parameters in image URLs may cause compilation errors. The script's Bash-heavy nature is described as "cursed" and difficult to read.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days