factool  by GAIR-NLP

Factuality detection tool for generative AI

created 2 years ago
886 stars

Top 41.6% on sourcepulse

GitHubView on GitHub
Project Summary

FacTool is a framework for detecting factual errors in text generated by large language models across four domains: knowledge-based QA, code generation, mathematical reasoning, and scientific literature review. It assists researchers and developers in evaluating and improving the factual accuracy of LLM outputs.

How It Works

FacTool employs a tool-augmented approach, leveraging external tools and LLM-based reasoning to verify claims. For knowledge-based QA, it uses search engines (Serper) and web scrapers to find evidence. For code generation, it checks for execution errors. For math, it verifies calculations. For scientific literature, it validates citations against actual publications. The framework breaks down responses into claims, generates queries for verification, retrieves evidence, and assesses factuality at both claim and response levels.

Quick Start & Requirements

  • Install via pip: pip install factool
  • Requires OpenAI API key for foundation models (GPT-3.5-turbo or GPT-4).
  • Knowledge-based QA requires a Serper API key.
  • Scientific literature review requires a Scraper API key.
  • See Installation and Quick Start for detailed setup.

Highlighted Details

  • Supports factuality detection across four distinct domains: KBQA, code, math, and scientific literature.
  • Provides a factuality leaderboard comparing different LLMs.
  • Offers a ChatGPT plugin for direct integration.
  • Includes a separate model, Halu-J, for critique-based hallucination judgment.

Maintenance & Community

The project is associated with GAIR-NLP and has contributions from multiple authors listed in the citation. The primary citation is available for reference.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The framework relies heavily on external API keys (OpenAI, Serper, Scraper), incurring costs. The factuality assessment accuracy is dependent on the quality of the underlying LLM and the effectiveness of the verification tools. The scientific literature review module showed a 0% response-level factuality in the provided example, indicating potential limitations in citation verification.

Health Check
Last commit

11 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
20 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.