RLSeq2Seq by yaserkl

Research paper code for sequence-to-sequence models using deep reinforcement learning

Created 7 years ago

768 stars

Top 45.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Luis Capelo

Cofounder of Lightning AI

Project Summary

This repository provides a framework for applying Deep Reinforcement Learning (RL) techniques to sequence-to-sequence (seq2seq) models, primarily for abstractive text summarization. It addresses common seq2seq challenges like exposure bias and train/test inconsistency by integrating RL methods. The target audience includes researchers and practitioners in NLP and deep learning looking to leverage RL for improved seq2seq performance.

How It Works

The framework implements several RL approaches for seq2seq tasks, including Scheduled Sampling (with hard/soft argmax), End-to-End Backpropagation, Policy-Gradient with Self-Critic, and Actor-Critic methods using DDQN and Dueling Networks. These RL techniques aim to optimize seq2seq models directly for task-specific metrics (like ROUGE scores) rather than relying solely on maximum likelihood estimation, thereby mitigating exposure bias and improving generation quality.

Quick Start & Requirements

Install: pip install -r python_requirements.txt
Prerequisites: Python 2.7, TensorFlow 1.10.1, CUDA 9, Cudnn 7.1.
Data: Requires pre-processed CNN/Daily Mail or Newsroom datasets. Helper scripts are provided for downloading and preprocessing.
Documentation: arXiv paper

Highlighted Details

Implements various RL strategies: Scheduled Sampling, Policy-Gradient (Self-Critic), and Actor-Critic (DDQN).
Supports attention mechanisms: temporal attention and intra-decoder attention.
Offers options for different training regimes: MLE, RL, and combined MLE+RL.
Includes detailed command-line examples for training and evaluation of different models.

Maintenance & Community

The project is marked as "no longer actively maintained." Contributions are welcome via pull requests.

Licensing & Compatibility

License: MIT License (as indicated by the PyPI badge, though the LICENSE.txt file is not directly linked).
Compatibility: Requires older versions of TensorFlow (1.10.1) and Python (2.7), which may pose compatibility challenges with modern systems.

Limitations & Caveats

The project explicitly states it is "no longer actively maintained." The reliance on outdated TensorFlow (1.10.1) and Python (2.7) versions presents significant adoption hurdles and potential compatibility issues with current hardware and software ecosystems.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days