ArXiv scanner using GPT-4 for personalized paper recommendations
Top 60.5% on sourcepulse
This project provides a daily ArXiv paper scanner that leverages GPT-4 to identify relevant research papers based on user-defined topics and author matches. It's designed for researchers and academics seeking to stay updated with the latest publications in their fields without manual sifting. The system automates the discovery process, delivering curated lists via GitHub Pages or Slack.
How It Works
The assistant fetches daily ArXiv papers via RSS feeds, filtering out updated papers to focus on new submissions. It prioritizes papers by matching authors against a user-provided list, using Semantic Scholar IDs for accuracy. Remaining papers are further filtered by an H-index cutoff to manage costs. A GPT-4 model then evaluates the filtered papers for relevance and novelty based on custom prompts defined in paper_topics.txt
, assigning scores. Papers are ultimately ranked by a combination of author match score and GPT-derived relevance/novelty scores.
Quick Start & Requirements
config/paper_topics.txt
for topic descriptions.config/authors.txt
with Semantic Scholar IDs for author matching.OAI_KEY
(OpenAI API key) as a GitHub secret.config/config.ini
.S2_KEY
(Semantic Scholar API key) for faster author lookups.SLACK_KEY
and SLACK_CHANNEL_ID
for Slack notifications.cs.CL
is ~$0.07/day.Highlighted Details
Maintenance & Community
ruff
for linting and formatting.Licensing & Compatibility
Limitations & Caveats
The system relies heavily on the quality of prompts in paper_topics.txt
and the accuracy of GPT-4's evaluations. Semantic Scholar API rate limits or slowness can impact performance if an API key is not provided. GitHub Actions inactivity may require manual intervention for forked repositories.
1 year ago
Inactive