transfer-learning-conv-ai by huggingface

Conversational AI code for transfer learning research

Created 6 years ago

1,756 stars

Top 24.2% on SourcePulse

View on GitHub

5 Experts Love This Project

Clement Delangue

Cofounder of Hugging Face

Binyuan Hui

Research Scientist at Alibaba Qwen

Vincent Weisser

Cofounder of Prime Intellect

Benjamin Bolte

Cofounder of K-Scale Labs

and 1 more!

Project Summary

This repository provides code for building state-of-the-art conversational AI agents using transfer learning from OpenAI's GPT and GPT-2 models. It's designed for researchers and developers aiming to reproduce results from the NeurIPS 2018 ConvAI2 competition or fine-tune their own dialogue systems. The project offers clean, commented code for training and inference, with options for distributed training and FP16 precision.

How It Works

The core approach leverages transfer learning from pre-trained Transformer language models (GPT, GPT-2). Dialogue history and personality context are fed into the model, which then generates responses. The training script incorporates options for multi-task learning, including language modeling and multiple-choice objectives, to improve conversational quality. The use of nucleus sampling for decoding is highlighted for a more compelling human-like interaction compared to beam search.

Quick Start & Requirements

Install: git clone the repo, cd into it, and run pip install -r requirements.txt. Also requires python -m spacy download en.
Docker: Build with docker build -t convai . (ensure sufficient memory allocation).
Pretrained Model: Run python interact.py to automatically download and use a fine-tuned model.
Dependencies: Python, PyTorch, spaCy, Apex (for FP16). GPU with CUDA is recommended for training.
Resources: Training on 8 V100 GPUs takes about an hour.

Highlighted Details

Reproduces state-of-the-art results from the ConvAI2 competition.
Offers distributed training and FP16 support for faster training.
Includes scripts for training, inference, and ConvAI2 evaluation.
Fine-tuned model available for immediate use.

Maintenance & Community

This project is associated with Hugging Face. Specific community channels or active maintenance status are not detailed in the README.

Licensing & Compatibility

The repository is licensed under the MIT License. This license permits commercial use and integration with closed-source projects.

Limitations & Caveats

The README notes that results may be slightly lower than original competition results without additional tweaks like custom position embeddings. It also mentions that beam search, while improving F1, offers a less compelling human experience than the provided nucleus sampling.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days