deepxiv_sdk by DeepXiv

AI agent for research paper search and progressive reading

Created 5 months ago

736 stars

Top 46.2% on SourcePulse

Project Summary

This Python package provides an agent-first approach to searching and progressively reading research papers, primarily from arXiv and PubMed Central. It targets engineers, researchers, and power users who need to integrate academic literature access into AI agent workflows, enabling efficient content discovery and consumption by prioritizing valuable sections over full paper downloads, thus optimizing token budgets and research time.

How It Works

DeepXiv is built around two core workflows: Search + Progressive Content Access and Trending + Popularity signals. Its CLI-first design allows agents to function like researchers: search broadly, judge quickly, and then read only the most pertinent parts. The key innovation is "progressive reading," where agents can inspect papers via --brief (summary, TLDR, keywords), --head (structure, token distribution), or --section (specific valuable parts like Introduction or Experiments), rather than loading entire documents. This layered access is advantageous for agents with limited token budgets and task-specific value assessments.

Quick Start & Requirements

Primary install: pip install deepxiv-sdk. For the full stack including the built-in research agent: pip install "deepxiv-sdk[all]".
Prerequisites: Python. No extra setup is required before the first query; the CLI automatically registers a free anonymous token (1,000 requests/day) on first use, saved to ~/.env.
Links:
- API Documentation: https://data.rag.ac.cn/api/docs
- 中文文档: README.zh.md
- GitHub Issues: https://github.com/qhjqhj00/deepxiv_sdk/issues

Highlighted Details

Progressive Reading CLI: Commands like deepxiv paper <id> --brief, --head, and --section <name> enable granular content access, crucial for agent workflows.
Trending & Popularity: Features to discover trending papers based on social signals and analyze paper-level propagation metrics (views, tweets, likes).
Agent Integration: Designed for seamless integration with agent runtimes (e.g., Codex, Claude Code) and includes a built-in research agent.
Data Sources: Integrates arXiv, PubMed Central (PMC), and Semantic Scholar metadata, focusing on open-access literature.

Maintenance & Community

The project is available via GitHub Issues for support. For higher request limits beyond the standard 10,000/day for registered tokens, users can contact tommy[at]chien.io to describe their use case. A roadmap indicates expansion towards a 100M+ scale academic paper data interface.

Licensing & Compatibility

The project is released under the MIT License, which is permissive and generally suitable for commercial use and linking within closed-source projects.

Limitations & Caveats

Free auto-registered tokens are limited to 1,000 requests per day, with registered tokens offering 10,000. Web search requests consume a higher portion of the daily limit (20 requests per search). The README mentions a "full-stack research platform is on the way," suggesting the current SDK may evolve or is part of a larger, developing ecosystem.

deepxiv_sdk by DeepXiv

Explore Similar Projects

AgentDisco by AgentDisCo-Project

EvoSkills by EvoScientist

hyperresearch by jordan-gibbs

resp by monk1337

smartsearch by konbakuyomu

scholaraio by ZimoLiao

Denario by AstroPilot-AI

agents-deep-research by qx-labs

paper-ai by 14790897

AgentLaboratory by SamuelSchmidgall

deep-research by dzhng

deer-flow by bytedance