gpt_paper_assistant by tatsu-lab

ArXiv scanner using GPT-4 for personalized paper recommendations

Created 2 years ago

541 stars

Top 58.8% on SourcePulse

View on GitHub

5 Experts Love This Project

Rodrigo Nader

Cofounder of Langflow

Yaowei Zheng

Author of LLaMA-Factory

Edward Sun

Research Scientist at Meta Superintelligence Lab

Ying Sheng

Coauthor of SGLang

and 1 more!

Project Summary

This project provides a daily ArXiv paper scanner that leverages GPT-4 to identify relevant research papers based on user-defined topics and author matches. It's designed for researchers and academics seeking to stay updated with the latest publications in their fields without manual sifting. The system automates the discovery process, delivering curated lists via GitHub Pages or Slack.

How It Works

The assistant fetches daily ArXiv papers via RSS feeds, filtering out updated papers to focus on new submissions. It prioritizes papers by matching authors against a user-provided list, using Semantic Scholar IDs for accuracy. Remaining papers are further filtered by an H-index cutoff to manage costs. A GPT-4 model then evaluates the filtered papers for relevance and novelty based on custom prompts defined in paper_topics.txt, assigning scores. Papers are ultimately ranked by a combination of author match score and GPT-derived relevance/novelty scores.

Quick Start & Requirements

Installation: Fork the repository and enable scheduled GitHub Actions.
Configuration:
- Create config/paper_topics.txt for topic descriptions.
- Create config/authors.txt with Semantic Scholar IDs for author matching.
- Set OAI_KEY (OpenAI API key) as a GitHub secret.
- Configure ArXiv categories in config/config.ini.
- Set GitHub Pages build source to GitHub Actions.
Optional:
- S2_KEY (Semantic Scholar API key) for faster author lookups.
- SLACK_KEY and SLACK_CHANNEL_ID for Slack notifications.
Resources: Minimal compute required; estimated cost for cs.CL is ~$0.07/day.
Documentation: README

Highlighted Details

Automated daily scanning and delivery via GitHub Pages or Slack.
GPT-4 powered relevance and novelty scoring for personalized filtering.
Author matching using Semantic Scholar IDs.
Cost-effective operation, with low daily compute costs.
Configurable filtering and output options.

Maintenance & Community

Original author: Tatsunori Hashimoto.
Contributor: Chenglei Si (benchmarking).
Development tools: ruff for linting and formatting.

Licensing & Compatibility

License: Apache 2.0.
Compatibility: Permissive license suitable for commercial use and integration with closed-source projects.

Limitations & Caveats

The system relies heavily on the quality of prompts in paper_topics.txt and the accuracy of GPT-4's evaluations. Semantic Scholar API rate limits or slowness can impact performance if an API key is not provided. GitHub Actions inactivity may require manual intervention for forked repositories.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days