FActScore  by shmsw25

Factual precision evaluation package for long-form text generation

created 2 years ago
366 stars

Top 78.1% on sourcepulse

GitHubView on GitHub
Project Summary

FActScore is a Python package designed to evaluate the factual precision of long-form text generation, targeting researchers and developers working with large language models. It provides a fine-grained, atomic evaluation of generated content against a knowledge source, enabling precise measurement of factual accuracy.

How It Works

FActScore decomposes generated text into atomic facts, retrieves supporting evidence from a knowledge source (defaulting to Wikipedia), and then uses a model (like ChatGPT or LLaMA) to verify the factual precision of each atomic fact. This approach allows for a granular assessment of accuracy, distinguishing between factual recall and precision, and offers a cost-effective evaluation method with API usage estimated at $1 per 100 sentences.

Quick Start & Requirements

  • Install: pip install --upgrade factscore
  • Prerequisites: Python 3.7+, spacy model (python -m spacy download en_core_web_sm).
  • Data Download: python -m factscore.download_data --llama_7B_HF_path "llama-7B" (requires LLaMA 7B HuggingFace weights for LLaMA-based evaluation; skip for ChatGPT-only evaluation).
  • Documentation: https://arxiv.org/abs/2305.14251

Highlighted Details

  • Evaluates factual precision at an atomic level.
  • Supports multiple evaluation backends: retrieval+ChatGPT and retrieval+llama+npm.
  • Offers a gamma hyperparameter for length penalty and an abstain_detection flag.
  • Allows custom knowledge source integration.

Maintenance & Community

The project is associated with the EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation." Further community interaction details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. However, the code is publicly available on GitHub, suggesting a permissive open-source license. Compatibility for commercial use or closed-source linking would require explicit license confirmation.

Limitations & Caveats

The default knowledge source is a Wikipedia dump from April 2023, which may not be up-to-date. The cost estimate is based on OpenAI API usage, which can fluctuate. Reproducing results requires specific model weights (e.g., LLaMA 7B) and potentially significant data downloads.

Health Check
Last commit

3 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
22 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.