Response generation model via large-scale pretraining
Top 19.6% on sourcepulse
DialoGPT is a large-scale pretrained language model for dialogue response generation, built upon OpenAI's GPT-2 architecture. It is designed for researchers and developers working on conversational AI, offering pre-trained models of varying sizes (117M, 345M, 762M parameters) trained on a massive dataset of multi-turn dialogues from Reddit. The primary benefit is its ability to generate human-like responses, achieving comparable quality to human responses in single-turn Turing tests.
How It Works
DialoGPT leverages the Transformer architecture, specifically GPT-2, for its generative capabilities. It was pre-trained on 147 million multi-turn dialogues extracted from Reddit discussion threads. The model's advantage lies in its scale and the extensive dialogue data, enabling it to capture nuanced conversational patterns and generate contextually relevant responses. The project also includes a retrieval-augmented variant (RetGen) for improved knowledge grounding.
Quick Start & Requirements
conda env create -f LSP-linux.yml -n LSP
and conda activate LSP
. CUDA 10.0 is required.demo.py
script can download models and data. Processing the full Reddit dataset (27GB+) requires approximately 10 hours with 8 processes and 800GB of disk space.Highlighted Details
Maintenance & Community
The project page states it is no longer actively maintained, recommending GODEL as a successor. However, data generation pipelines were updated in July 2022. Contact: DialoGPT@microsoft.com.
Licensing & Compatibility
The repository is licensed under the MIT License. It is compatible with commercial use and closed-source linking.
Limitations & Caveats
The project explicitly states that decoding scripts are not provided due to the potential for generating toxic/inappropriate responses, and access is invitation-only. Support is limited to Ubuntu 16.04.
2 years ago
Inactive