deepxiv_sdk  by DeepXiv

AI agent for research paper search and progressive reading

Created 2 months ago
402 stars

Top 72.0% on SourcePulse

GitHubView on GitHub
Project Summary

This Python package provides an agent-first approach to searching and progressively reading research papers, primarily from arXiv and PubMed Central. It targets engineers, researchers, and power users who need to integrate academic literature access into AI agent workflows, enabling efficient content discovery and consumption by prioritizing valuable sections over full paper downloads, thus optimizing token budgets and research time.

How It Works

DeepXiv is built around two core workflows: Search + Progressive Content Access and Trending + Popularity signals. Its CLI-first design allows agents to function like researchers: search broadly, judge quickly, and then read only the most pertinent parts. The key innovation is "progressive reading," where agents can inspect papers via --brief (summary, TLDR, keywords), --head (structure, token distribution), or --section (specific valuable parts like Introduction or Experiments), rather than loading entire documents. This layered access is advantageous for agents with limited token budgets and task-specific value assessments.

Quick Start & Requirements

  • Primary install: pip install deepxiv-sdk. For the full stack including the built-in research agent: pip install "deepxiv-sdk[all]".
  • Prerequisites: Python. No extra setup is required before the first query; the CLI automatically registers a free anonymous token (1,000 requests/day) on first use, saved to ~/.env.
  • Links:
    • API Documentation: https://data.rag.ac.cn/api/docs
    • 中文文档: README.zh.md
    • GitHub Issues: https://github.com/qhjqhj00/deepxiv_sdk/issues

Highlighted Details

  • Progressive Reading CLI: Commands like deepxiv paper <id> --brief, --head, and --section <name> enable granular content access, crucial for agent workflows.
  • Trending & Popularity: Features to discover trending papers based on social signals and analyze paper-level propagation metrics (views, tweets, likes).
  • Agent Integration: Designed for seamless integration with agent runtimes (e.g., Codex, Claude Code) and includes a built-in research agent.
  • Data Sources: Integrates arXiv, PubMed Central (PMC), and Semantic Scholar metadata, focusing on open-access literature.

Maintenance & Community

The project is available via GitHub Issues for support. For higher request limits beyond the standard 10,000/day for registered tokens, users can contact tommy[at]chien.io to describe their use case. A roadmap indicates expansion towards a 100M+ scale academic paper data interface.

Licensing & Compatibility

The project is released under the MIT License, which is permissive and generally suitable for commercial use and linking within closed-source projects.

Limitations & Caveats

Free auto-registered tokens are limited to 1,000 requests per day, with registered tokens offering 10,000. Web search requests consume a higher portion of the daily limit (20 requests per search). The README mentions a "full-stack research platform is on the way," suggesting the current SDK may evolve or is part of a larger, developing ecosystem.

Health Check
Last Commit

2 days ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
5
Star History
400 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Vincent Weisser Vincent Weisser(Cofounder of Prime Intellect), and
1 more.

AgentLaboratory by SamuelSchmidgall

0.4%
5k
Agentic framework for autonomous research workflows
Created 1 year ago
Updated 7 months ago
Feedback? Help us improve.