Code and data for summarization research
Top 36.9% on sourcepulse
This repository provides code for training and evaluating summarization models using human feedback, targeting researchers and developers in NLP. It enables the replication of OpenAI's "Learning to Summarize from Human Feedback" paper, offering a supervised baseline, a reward model, and an RL fine-tuned policy.
How It Works
The project implements a reinforcement learning approach where a summarization model is fine-tuned using a reward model trained on human preferences. This reward model learns to score summaries based on pairwise comparisons provided by human annotators, guiding the RL policy to generate summaries that align with human judgment. This method aims to improve summary quality beyond standard supervised learning.
Quick Start & Requirements
pipenv install
.pipenv run exps/sample.py test test-sample
.azcopy copy "https://openaipublic.blob.core.windows.net/summarize-from-feedback/dataset/*" . --recursive
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is archived and will not receive updates. It has specific platform requirements (Ubuntu 18.04, Python 3.7) and requires an Nvidia GPU, potentially limiting broader adoption.
1 year ago
Inactive