transfer-learning-conv-ai  by huggingface

Conversational AI code for transfer learning research

Created 6 years ago
1,748 stars

Top 24.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides code for building state-of-the-art conversational AI agents using transfer learning from OpenAI's GPT and GPT-2 models. It's designed for researchers and developers aiming to reproduce results from the NeurIPS 2018 ConvAI2 competition or fine-tune their own dialogue systems. The project offers clean, commented code for training and inference, with options for distributed training and FP16 precision.

How It Works

The core approach leverages transfer learning from pre-trained Transformer language models (GPT, GPT-2). Dialogue history and personality context are fed into the model, which then generates responses. The training script incorporates options for multi-task learning, including language modeling and multiple-choice objectives, to improve conversational quality. The use of nucleus sampling for decoding is highlighted for a more compelling human-like interaction compared to beam search.

Quick Start & Requirements

  • Install: git clone the repo, cd into it, and run pip install -r requirements.txt. Also requires python -m spacy download en.
  • Docker: Build with docker build -t convai . (ensure sufficient memory allocation).
  • Pretrained Model: Run python interact.py to automatically download and use a fine-tuned model.
  • Dependencies: Python, PyTorch, spaCy, Apex (for FP16). GPU with CUDA is recommended for training.
  • Resources: Training on 8 V100 GPUs takes about an hour.

Highlighted Details

  • Reproduces state-of-the-art results from the ConvAI2 competition.
  • Offers distributed training and FP16 support for faster training.
  • Includes scripts for training, inference, and ConvAI2 evaluation.
  • Fine-tuned model available for immediate use.

Maintenance & Community

This project is associated with Hugging Face. Specific community channels or active maintenance status are not detailed in the README.

Licensing & Compatibility

The repository is licensed under the MIT License. This license permits commercial use and integration with closed-source projects.

Limitations & Caveats

The README notes that results may be slightly lower than original competition results without additional tweaks like custom position embeddings. It also mentions that beam search, while improving F1, offers a less compelling human experience than the provided nucleus sampling.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Elvis Saravia Elvis Saravia(Founder of DAIR.AI) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

awesome-transformer-nlp by cedrickchee

0%
1k
Curated list of NLP resources for Transformer networks
Created 6 years ago
Updated 10 months ago
Starred by Lukas Biewald Lukas Biewald(Cofounder of Weights & Biases), Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), and
2 more.

DialoGPT by microsoft

0.1%
2k
Response generation model via large-scale pretraining
Created 6 years ago
Updated 2 years ago
Feedback? Help us improve.