FActScore  by shmsw25

Factual precision evaluation package for long-form text generation

Created 2 years ago
415 stars

Top 70.8% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

FActScore is a Python package designed to evaluate the factual precision of long-form text generation, targeting researchers and developers working with large language models. It provides a fine-grained, atomic evaluation of generated content against a knowledge source, enabling precise measurement of factual accuracy.

How It Works

FActScore decomposes generated text into atomic facts, retrieves supporting evidence from a knowledge source (defaulting to Wikipedia), and then uses a model (like ChatGPT or LLaMA) to verify the factual precision of each atomic fact. This approach allows for a granular assessment of accuracy, distinguishing between factual recall and precision, and offers a cost-effective evaluation method with API usage estimated at $1 per 100 sentences.

Quick Start & Requirements

  • Install: pip install --upgrade factscore
  • Prerequisites: Python 3.7+, spacy model (python -m spacy download en_core_web_sm).
  • Data Download: python -m factscore.download_data --llama_7B_HF_path "llama-7B" (requires LLaMA 7B HuggingFace weights for LLaMA-based evaluation; skip for ChatGPT-only evaluation).
  • Documentation: https://arxiv.org/abs/2305.14251

Highlighted Details

  • Evaluates factual precision at an atomic level.
  • Supports multiple evaluation backends: retrieval+ChatGPT and retrieval+llama+npm.
  • Offers a gamma hyperparameter for length penalty and an abstain_detection flag.
  • Allows custom knowledge source integration.

Maintenance & Community

The project is associated with the EMNLP 2023 paper "FActScore: Fine-grained Atomic Evaluation of Factual Precision in Long Form Text Generation." Further community interaction details are not explicitly provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. However, the code is publicly available on GitHub, suggesting a permissive open-source license. Compatibility for commercial use or closed-source linking would require explicit license confirmation.

Limitations & Caveats

The default knowledge source is a Wikipedia dump from April 2023, which may not be up-to-date. The cost estimate is based on OpenAI API usage, which can fluctuate. Reproducing results requires specific model weights (e.g., LLaMA 7B) and potentially significant data downloads.

Health Check
Last Commit

10 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
1
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by John Resig John Resig(Author of jQuery; Chief Software Architect at Khan Academy), Sasha Rush Sasha Rush(Research Scientist at Cursor; Professor at Cornell Tech), and
2 more.

llmparser by kyang6

0%
428
LLM tool for structured data extraction and classification
Created 2 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Travis Fischer Travis Fischer(Founder of Agentic).

long-form-factuality by google-deepmind

0%
666
Benchmark for long-form factuality in LLMs
Created 1 year ago
Updated 2 weeks ago
Feedback? Help us improve.