DialoGPT by microsoft

Response generation model via large-scale pretraining

Created 6 years ago

2,417 stars

Top 18.8% on SourcePulse

View on GitHub

4 Experts Love This Project

Lukas Biewald

Cofounder of Weights & Biases

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

Julien Chaumond

Cofounder of Hugging Face

Anastasis Germanidis

Cofounder of Runway

Project Summary

DialoGPT is a large-scale pretrained language model for dialogue response generation, built upon OpenAI's GPT-2 architecture. It is designed for researchers and developers working on conversational AI, offering pre-trained models of varying sizes (117M, 345M, 762M parameters) trained on a massive dataset of multi-turn dialogues from Reddit. The primary benefit is its ability to generate human-like responses, achieving comparable quality to human responses in single-turn Turing tests.

How It Works

DialoGPT leverages the Transformer architecture, specifically GPT-2, for its generative capabilities. It was pre-trained on 147 million multi-turn dialogues extracted from Reddit discussion threads. The model's advantage lies in its scale and the extensive dialogue data, enabling it to capture nuanced conversational patterns and generate contextually relevant responses. The project also includes a retrieval-augmented variant (RetGen) for improved knowledge grounding.

Quick Start & Requirements

Install: Clone the repository and set up a Conda environment using conda env create -f LSP-linux.yml -n LSP and conda activate LSP. CUDA 10.0 is required.
Data: The demo.py script can download models and data. Processing the full Reddit dataset (27GB+) requires approximately 10 hours with 8 processes and 800GB of disk space.
Hardware: Recommended: Linux Ubuntu 16.04, GPU with at least 12GB memory. The 762M model requires >16GB VRAM.
Links: Project page: https://www.microsoft.com/en-us/research/project/large-scale-pretraining-for-response-generation/, ArXiv paper: https://arxiv.org/abs/1911.00536, Huggingface models: https://huggingface.co/microsoft

Highlighted Details

Achieved state-of-the-art results in the DSTC-7 challenge for response generation.
Human evaluation shows generated responses are comparable to human quality in single-turn Turing tests.
Offers three model sizes (117M, 345M, 762M parameters) and a ranking model (DialogRPT).
Includes scripts for data extraction, preprocessing, and training, with support for distributed training and FP16.

Maintenance & Community

The project page states it is no longer actively maintained, recommending GODEL as a successor. However, data generation pipelines were updated in July 2022. Contact: DialoGPT@microsoft.com.

Licensing & Compatibility

The repository is licensed under the MIT License. It is compatible with commercial use and closed-source linking.

Limitations & Caveats

The project explicitly states that decoding scripts are not provided due to the potential for generating toxic/inappropriate responses, and access is invitation-only. Support is limited to Ubuntu 16.04.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days