gpt_paper_assistant  by tatsu-lab

ArXiv scanner using GPT-4 for personalized paper recommendations

created 1 year ago
530 stars

Top 60.5% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This project provides a daily ArXiv paper scanner that leverages GPT-4 to identify relevant research papers based on user-defined topics and author matches. It's designed for researchers and academics seeking to stay updated with the latest publications in their fields without manual sifting. The system automates the discovery process, delivering curated lists via GitHub Pages or Slack.

How It Works

The assistant fetches daily ArXiv papers via RSS feeds, filtering out updated papers to focus on new submissions. It prioritizes papers by matching authors against a user-provided list, using Semantic Scholar IDs for accuracy. Remaining papers are further filtered by an H-index cutoff to manage costs. A GPT-4 model then evaluates the filtered papers for relevance and novelty based on custom prompts defined in paper_topics.txt, assigning scores. Papers are ultimately ranked by a combination of author match score and GPT-derived relevance/novelty scores.

Quick Start & Requirements

  • Installation: Fork the repository and enable scheduled GitHub Actions.
  • Configuration:
    • Create config/paper_topics.txt for topic descriptions.
    • Create config/authors.txt with Semantic Scholar IDs for author matching.
    • Set OAI_KEY (OpenAI API key) as a GitHub secret.
    • Configure ArXiv categories in config/config.ini.
    • Set GitHub Pages build source to GitHub Actions.
  • Optional:
    • S2_KEY (Semantic Scholar API key) for faster author lookups.
    • SLACK_KEY and SLACK_CHANNEL_ID for Slack notifications.
  • Resources: Minimal compute required; estimated cost for cs.CL is ~$0.07/day.
  • Documentation: README

Highlighted Details

  • Automated daily scanning and delivery via GitHub Pages or Slack.
  • GPT-4 powered relevance and novelty scoring for personalized filtering.
  • Author matching using Semantic Scholar IDs.
  • Cost-effective operation, with low daily compute costs.
  • Configurable filtering and output options.

Maintenance & Community

  • Original author: Tatsunori Hashimoto.
  • Contributor: Chenglei Si (benchmarking).
  • Development tools: ruff for linting and formatting.

Licensing & Compatibility

  • License: Apache 2.0.
  • Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The system relies heavily on the quality of prompts in paper_topics.txt and the accuracy of GPT-4's evaluations. Semantic Scholar API rate limits or slowness can impact performance if an API key is not provided. GitHub Actions inactivity may require manual intervention for forked repositories.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.