paper-reviewer  by deep-diver

Paper reviewer for auto-generating blog posts

Created 11 months ago
790 stars

Top 44.5% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides tools to automatically generate comprehensive reviews of arXiv and OpenReview papers and convert them into blog posts. It is designed for researchers, academics, and content creators looking to streamline the process of summarizing and disseminating scientific literature, powering Hugging Face's Daily Papers and NeurIPS 2024 web pages.

How It Works

The system utilizes two primary Python scripts: collect.py for gathering paper data and generating reviews, and convert.py for transforming these reviews into blog post formats. collect.py can leverage different backends for visual information extraction, including Upstage (paid) or Gemini (best-effort), and supports GPU acceleration via MinerU with specific configuration. convert.py then applies a fixed template to structure the review into a blog post, with options for image uploading to Cloudflare R2.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt
  • Prerequisites:
    • GEMINI_API_KEY environment variable (mandatory for Gemini).
    • UPSTAGE_API_KEY (optional, for Upstage document parsing).
    • Cloudflare R2 credentials (R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_S3_ENDPOINT_URL, R2_DOMAIN_NAME) for image uploads.
    • poppler-utils (Ubuntu: apt install poppler-utils, macOS: brew install poppler).
    • For MinerU with GPU: Python 3.10, modify ~/magic-pdf.json to set "device-mode": "cuda".
  • Usage:
    • Review generation: python collect.py --arxiv-id "..." [--stop-at-no-html] [--use-upstage]
    • Blog post conversion: python convert.py --arxiv-id "..." [--upload-images-r2]
  • Documentation: AI Paper Reviewer

Highlighted Details

  • Powers Hugging Face Daily Papers and NeurIPS 2024 web pages.
  • Offers choice between Upstage (paid, higher accuracy) and Gemini (free, best-effort) for visual extraction.
  • Supports GPU acceleration via MinerU for enhanced performance.
  • Includes optional image uploading to Cloudflare R2 for blog posts.

Maintenance & Community

The project is actively maintained and powers significant Hugging Face initiatives. Further community engagement details are not explicitly provided in the README.

Licensing & Compatibility

The README does not specify a license. Compatibility for commercial use or closed-source linking is not detailed.

Limitations & Caveats

The accuracy of visual information extraction without Upstage is noted as best-effort. Customizing the blog post design requires manual modification of the template files. MinerU usage requires specific Python versions and configuration adjustments for GPU support.

Health Check
Last Commit

7 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.