FActScore by shmsw25

Factual precision evaluation package for long-form text generation

Created 2 years ago

414 stars

Top 70.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Benjamin Bolte

Cofounder of K-Scale Labs

Project Summary

FActScore is a Python package designed to evaluate the factual precision of long-form text generation, targeting researchers and developers working with large language models. It provides a fine-grained, atomic evaluation of generated content against a knowledge source, enabling precise measurement of factual accuracy.

How It Works

FActScore decomposes generated text into atomic facts, retrieves supporting evidence from a knowledge source (defaulting to Wikipedia), and then uses a model (like ChatGPT or LLaMA) to verify the factual precision of each atomic fact. This approach allows for a granular assessment of accuracy, distinguishing between factual recall and precision, and offers a cost-effective evaluation method with API usage estimated at $1 per 100 sentences.

Quick Start & Requirements

Install: pip install --upgrade factscore
Prerequisites: Python 3.7+, spacy model (python -m spacy download en_core_web_sm).
Data Download: python -m factscore.download_data --llama_7B_HF_path "llama-7B" (requires LLaMA 7B HuggingFace weights for LLaMA-based evaluation; skip for ChatGPT-only evaluation).
Documentation: https://arxiv.org/abs/2305.14251

Highlighted Details

Evaluates factual precision at an atomic level.
Supports multiple evaluation backends: retrieval+ChatGPT and retrieval+llama+npm.
Offers a gamma hyperparameter for length penalty and an abstain_detection flag.
Allows custom knowledge source integration.

Maintenance & Community

The project is associated with the EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation." Further community interaction details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. However, the code is publicly available on GitHub, suggesting a permissive open-source license. Compatibility for commercial use or closed-source linking would require explicit license confirmation.

Limitations & Caveats

The default knowledge source is a Wikipedia dump from April 2023, which may not be up-to-date. The cost estimate is based on OpenAI API usage, which can fluctuate. Reproducing results requires specific model weights (e.g., LLaMA 7B) and potentially significant data downloads.

Health Check

Last Commit

9 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

5 stars in the last 30 days