refchecker by markrussinovich

Academic reference checker for authors and reviewers

Created 1 year ago

425 stars

Top 68.8% on SourcePulse

View on GitHub

1 Expert Loves This Project

Gagan Bansal

Coauthor of AutoGen; Research Scientist at Microsoft Research

Project Summary

RefChecker is a tool designed to validate the accuracy of references within academic papers, benefiting authors and reviewers by ensuring citation authenticity. It cross-references citations against Semantic Scholar, OpenAlex, and CrossRef, providing detailed reports on discrepancies.

How It Works

RefChecker employs a multi-source verification strategy, leveraging Semantic Scholar, OpenAlex, and CrossRef APIs to cross-check citation details. It integrates LLM-powered extraction (supporting OpenAI, Anthropic, Google, Azure, and vLLM) to robustly parse complex bibliographies and handle variations in citation formatting, enhancing accuracy and reducing manual effort.

Quick Start & Requirements

Primary Install:
- Docker: docker run -p 8000:8000 ghcr.io/markrussinovich/refchecker:latest (Access via http://localhost:8000)
- Pip (Web UI + CLI + LLM): pip install academic-refchecker[llm,webui]
- Pip (CLI only): pip install academic-refchecker[llm]
Prerequisites: Python 3.7+ (3.10+ recommended). LLM API keys (e.g., ANTHROPIC_API_KEY, OPENAI_API_KEY) are recommended for enhanced performance. Node.js 18+ is required for Web UI development.
Links: Web UI: http://localhost:8000 (Docker/pip), CLI: academic-refchecker.

Highlighted Details

Supports multiple input formats: ArXiv IDs/URLs, PDFs, LaTeX, and text files.
Integrates with major LLM providers (OpenAI, Anthropic, Google, Azure) and local vLLM for advanced extraction.
Performs comprehensive checks on titles, authors, years, venues, DOIs, and ArXiv IDs.
Features smart matching to reconcile formatting variations (e.g., "BERT" vs. "B-ERT").
Provides detailed reports categorizing issues as Errors, Warnings, Suggestions, or Unverified references.
Web UI enables bulk checking of multiple papers or ZIP archives.

Maintenance & Community

Information regarding specific contributors, sponsorships, or community channels (like Discord/Slack) is not detailed in the provided README.

Licensing & Compatibility

The project is released under the MIT License, generally permitting commercial use and integration into closed-source projects without significant copyleft restrictions.

Limitations & Caveats

Verification speed is significantly enhanced by setting a SEMANTIC_SCHOLAR_API_KEY. Some LLMs, like GPT-4o, may occasionally hallucinate DOIs. Local LLM inference requires setting up a separate vLLM server. Multi-user mode necessitates OAuth provider configuration.

Health Check

Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

30 stars in the last 30 days