refchecker  by markrussinovich

Academic reference checker for authors and reviewers

Created 1 year ago
290 stars

Top 90.8% on SourcePulse

GitHubView on GitHub
Project Summary

RefChecker is a tool designed to validate the accuracy of references within academic papers, benefiting authors and reviewers by ensuring citation authenticity. It cross-references citations against Semantic Scholar, OpenAlex, and CrossRef, providing detailed reports on discrepancies.

How It Works

RefChecker employs a multi-source verification strategy, leveraging Semantic Scholar, OpenAlex, and CrossRef APIs to cross-check citation details. It integrates LLM-powered extraction (supporting OpenAI, Anthropic, Google, Azure, and vLLM) to robustly parse complex bibliographies and handle variations in citation formatting, enhancing accuracy and reducing manual effort.

Quick Start & Requirements

  • Primary Install:
    • Docker: docker run -p 8000:8000 ghcr.io/markrussinovich/refchecker:latest (Access via http://localhost:8000)
    • Pip (Web UI + CLI + LLM): pip install academic-refchecker[llm,webui]
    • Pip (CLI only): pip install academic-refchecker[llm]
  • Prerequisites: Python 3.7+ (3.10+ recommended). LLM API keys (e.g., ANTHROPIC_API_KEY, OPENAI_API_KEY) are recommended for enhanced performance. Node.js 18+ is required for Web UI development.
  • Links: Web UI: http://localhost:8000 (Docker/pip), CLI: academic-refchecker.

Highlighted Details

  • Supports multiple input formats: ArXiv IDs/URLs, PDFs, LaTeX, and text files.
  • Integrates with major LLM providers (OpenAI, Anthropic, Google, Azure) and local vLLM for advanced extraction.
  • Performs comprehensive checks on titles, authors, years, venues, DOIs, and ArXiv IDs.
  • Features smart matching to reconcile formatting variations (e.g., "BERT" vs. "B-ERT").
  • Provides detailed reports categorizing issues as Errors, Warnings, Suggestions, or Unverified references.
  • Web UI enables bulk checking of multiple papers or ZIP archives.

Maintenance & Community

Information regarding specific contributors, sponsorships, or community channels (like Discord/Slack) is not detailed in the provided README.

Licensing & Compatibility

The project is released under the MIT License, generally permitting commercial use and integration into closed-source projects without significant copyleft restrictions.

Limitations & Caveats

Verification speed is significantly enhanced by setting a SEMANTIC_SCHOLAR_API_KEY. Some LLMs, like GPT-4o, may occasionally hallucinate DOIs. Local LLM inference requires setting up a separate vLLM server. Multi-user mode necessitates OAuth provider configuration.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
38 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.