factool by GAIR-NLP

Factuality detection tool for generative AI

Created 2 years ago

905 stars

Top 40.1% on SourcePulse

View on GitHub

2 Experts Love This Project

Elvis Saravia

Founder of DAIR.AI

Edward Sun

Research Scientist at Meta Superintelligence Lab

Project Summary

FacTool is a framework for detecting factual errors in text generated by large language models across four domains: knowledge-based QA, code generation, mathematical reasoning, and scientific literature review. It assists researchers and developers in evaluating and improving the factual accuracy of LLM outputs.

How It Works

FacTool employs a tool-augmented approach, leveraging external tools and LLM-based reasoning to verify claims. For knowledge-based QA, it uses search engines (Serper) and web scrapers to find evidence. For code generation, it checks for execution errors. For math, it verifies calculations. For scientific literature, it validates citations against actual publications. The framework breaks down responses into claims, generates queries for verification, retrieves evidence, and assesses factuality at both claim and response levels.

Quick Start & Requirements

Install via pip: pip install factool
Requires OpenAI API key for foundation models (GPT-3.5-turbo or GPT-4).
Knowledge-based QA requires a Serper API key.
Scientific literature review requires a Scraper API key.
See Installation and Quick Start for detailed setup.

Highlighted Details

Supports factuality detection across four distinct domains: KBQA, code, math, and scientific literature.
Provides a factuality leaderboard comparing different LLMs.
Offers a ChatGPT plugin for direct integration.
Includes a separate model, Halu-J, for critique-based hallucination judgment.

Maintenance & Community

The project is associated with GAIR-NLP and has contributions from multiple authors listed in the citation. The primary citation is available for reference.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The framework relies heavily on external API keys (OpenAI, Serper, Scraper), incurring costs. The factuality assessment accuracy is dependent on the quality of the underlying LLM and the effectiveness of the verification tools. The scientific literature review module showed a 0% response-level factuality in the provided example, indicating potential limitations in citation verification.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days