summarize-from-feedback  by openai

Code and data for summarization research

created 4 years ago
1,036 stars

Top 36.9% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides code for training and evaluating summarization models using human feedback, targeting researchers and developers in NLP. It enables the replication of OpenAI's "Learning to Summarize from Human Feedback" paper, offering a supervised baseline, a reward model, and an RL fine-tuned policy.

How It Works

The project implements a reinforcement learning approach where a summarization model is fine-tuned using a reward model trained on human preferences. This reward model learns to score summaries based on pairwise comparisons provided by human annotators, guiding the RL policy to generate summaries that align with human judgment. This method aims to improve summary quality beyond standard supervised learning.

Quick Start & Requirements

Highlighted Details

  • Released human feedback dataset with 64,832 summary comparisons.
  • Includes code for supervised baseline, reward model, and RL policy.
  • Supports evaluation on TL;DR and CNN/DM datasets.
  • Provides filtered versions of the TL;DR dataset.

Maintenance & Community

  • Status: Archive (code is provided as-is, no updates expected).

Licensing & Compatibility

  • Original TL;DR dataset is licensed under CC BY 4.0.
  • No explicit license is mentioned for the code itself.

Limitations & Caveats

The project is archived and will not receive updates. It has specific platform requirements (Ubuntu 18.04, Python 3.7) and requires an Nvidia GPU, potentially limiting broader adoption.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
17 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.