DialoGPT  by microsoft

Response generation model via large-scale pretraining

Created 6 years ago
2,406 stars

Top 19.1% on SourcePulse

GitHubView on GitHub
Project Summary

DialoGPT is a large-scale pretrained language model for dialogue response generation, built upon OpenAI's GPT-2 architecture. It is designed for researchers and developers working on conversational AI, offering pre-trained models of varying sizes (117M, 345M, 762M parameters) trained on a massive dataset of multi-turn dialogues from Reddit. The primary benefit is its ability to generate human-like responses, achieving comparable quality to human responses in single-turn Turing tests.

How It Works

DialoGPT leverages the Transformer architecture, specifically GPT-2, for its generative capabilities. It was pre-trained on 147 million multi-turn dialogues extracted from Reddit discussion threads. The model's advantage lies in its scale and the extensive dialogue data, enabling it to capture nuanced conversational patterns and generate contextually relevant responses. The project also includes a retrieval-augmented variant (RetGen) for improved knowledge grounding.

Quick Start & Requirements

Highlighted Details

  • Achieved state-of-the-art results in the DSTC-7 challenge for response generation.
  • Human evaluation shows generated responses are comparable to human quality in single-turn Turing tests.
  • Offers three model sizes (117M, 345M, 762M parameters) and a ranking model (DialogRPT).
  • Includes scripts for data extraction, preprocessing, and training, with support for distributed training and FP16.

Maintenance & Community

The project page states it is no longer actively maintained, recommending GODEL as a successor. However, data generation pipelines were updated in July 2022. Contact: DialoGPT@microsoft.com.

Licensing & Compatibility

The repository is licensed under the MIT License. It is compatible with commercial use and closed-source linking.

Limitations & Caveats

The project explicitly states that decoding scripts are not provided due to the potential for generating toxic/inappropriate responses, and access is invitation-only. Support is limited to Ubuntu 16.04.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

InternEvo by InternLM

0.2%
407
Lightweight training framework for model pre-training
Created 1 year ago
Updated 4 weeks ago
Starred by Théophile Gervet Théophile Gervet(Cofounder of Genesis AI), Jason Knight Jason Knight(Director AI Compilers at NVIDIA; Cofounder of OctoML), and
6 more.

lingua by facebookresearch

0.1%
5k
LLM research codebase for training and inference
Created 11 months ago
Updated 2 months ago
Feedback? Help us improve.