factool  by GAIR-NLP

Factuality detection tool for generative AI

Created 2 years ago
888 stars

Top 40.8% on SourcePulse

GitHubView on GitHub
Project Summary

FacTool is a framework for detecting factual errors in text generated by large language models across four domains: knowledge-based QA, code generation, mathematical reasoning, and scientific literature review. It assists researchers and developers in evaluating and improving the factual accuracy of LLM outputs.

How It Works

FacTool employs a tool-augmented approach, leveraging external tools and LLM-based reasoning to verify claims. For knowledge-based QA, it uses search engines (Serper) and web scrapers to find evidence. For code generation, it checks for execution errors. For math, it verifies calculations. For scientific literature, it validates citations against actual publications. The framework breaks down responses into claims, generates queries for verification, retrieves evidence, and assesses factuality at both claim and response levels.

Quick Start & Requirements

  • Install via pip: pip install factool
  • Requires OpenAI API key for foundation models (GPT-3.5-turbo or GPT-4).
  • Knowledge-based QA requires a Serper API key.
  • Scientific literature review requires a Scraper API key.
  • See Installation and Quick Start for detailed setup.

Highlighted Details

  • Supports factuality detection across four distinct domains: KBQA, code, math, and scientific literature.
  • Provides a factuality leaderboard comparing different LLMs.
  • Offers a ChatGPT plugin for direct integration.
  • Includes a separate model, Halu-J, for critique-based hallucination judgment.

Maintenance & Community

The project is associated with GAIR-NLP and has contributions from multiple authors listed in the citation. The primary citation is available for reference.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The framework relies heavily on external API keys (OpenAI, Serper, Scraper), incurring costs. The factuality assessment accuracy is dependent on the quality of the underlying LLM and the effectiveness of the verification tools. The scientific literature review module showed a 0% response-level factuality in the provided example, indicating potential limitations in citation verification.

Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Eric Zhu Eric Zhu(Coauthor of AutoGen; Research Scientist at Microsoft Research) and Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA).

DS-1000 by xlang-ai

0.4%
256
Benchmark for data science code generation
Created 2 years ago
Updated 10 months ago
Starred by Dan Abramov Dan Abramov(Core Contributor to React; Coauthor of Redux, Create React App) and Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab).

LeanDojo by lean-dojo

0%
705
Machine learning for theorem proving in Lean
Created 2 years ago
Updated 5 days ago
Feedback? Help us improve.