Evaluation toolkit for summarization models
Top 73.4% on sourcepulse
This repository provides resources for the "SummEval: Re-evaluating Summarization Evaluation" paper, offering a comprehensive dataset of model-generated summaries and human annotations for evaluating summarization systems. It targets researchers and practitioners in Natural Language Processing (NLP) who need robust tools and data for assessing summarization quality.
How It Works
The project provides pre-computed outputs from 23 state-of-the-art summarization models, including both extractive and abstractive approaches, alongside human annotations across four dimensions: coherence, consistency, fluency, and relevance. It also includes an evaluation toolkit that unifies popular and novel metrics like ROUGE, MoverScore, BertScore, and BLANC, enabling standardized and reproducible evaluation of summarization models.
Quick Start & Requirements
pip install summ-eval
data_processing/pair_data.py
.Highlighted Details
calc-scores
) and Python API for easy integration.Maintenance & Community
This project is a collaboration between Yale LILY Lab and Salesforce Research. Issues and Pull Requests are welcome via GitHub.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. Model outputs are shared with author consent and require citing original papers.
Limitations & Caveats
The README does not specify a license, which may impact commercial use. The data pairing process requires manual downloading of external datasets.
1 year ago
1 week