paperify  by jstrieb

CLI tool to transform documents into research papers

created 1 year ago
372 stars

Top 77.4% on SourcePulse

GitHubView on GitHub
Project Summary

Paperify transforms any document, webpage, or ebook into a realistic-looking research paper by interspersing figures and equations from real academic papers. It targets users who need to quickly generate plausible-looking academic documents for presentations, mockups, or creative purposes, offering a novel way to add visual academic elements to any text.

How It Works

The script leverages a Unix pipeline approach, primarily using Bash, to process input documents. It converts the input to Markdown via Pandoc, then fetches figures and LaTeX-formatted equations from a specified number of arXiv papers. These elements are randomly interspersed into the Markdown content based on configurable frequencies. Optionally, it can use the OpenAI API to generate a title, abstract, and metadata, and compiles the final output using the IEEE LaTeX template.

Quick Start & Requirements

  • Install dependencies: curl, Python 3, pandoc, jq, texlive (with texlive-publishers, texlive-science, lmodern, texlive-latex-extra), and optionally imagemagick.
  • Install script: curl -L https://github.com/jstrieb/paperify/raw/master/paperify.sh | sudo tee /usr/local/bin/paperify && sudo chmod +x /usr/local/bin/paperify
  • Example: paperify "input.txt" "output.pdf"
  • Docker image available: jstrieb/paperify
  • Official quick-start and examples are provided in the README.

Highlighted Details

  • Generates realistic academic papers from arbitrary text sources.
  • Intersperse figures and equations from arXiv papers with configurable frequency.
  • Optional ChatGPT integration for title, abstract, and metadata generation.
  • Compiles output using the IEEE LaTeX template for authenticity.

Maintenance & Community

The project is marked as complete by the author, with no further development planned beyond addressing issues and potentially merging pull requests. The author expresses hope for long-term compatibility.

Licensing & Compatibility

The README does not explicitly state a license. The project relies on external tools like Pandoc and LaTeX, which have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The script uses Markdown as an intermediate format, which can lead to loss of original styling and information. Non-ASCII Unicode characters are stripped before LaTeX compilation. Image filtering is heuristic-based and may result in false positives/negatives. Some web pages with query parameters in image URLs may cause compilation errors. The script's Bash-heavy nature is described as "cursed" and difficult to read.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.