GPT2-chitchat  by yangjianxin1

GPT2 for Chinese chitchat bot, fine-tuned for dialogue

created 5 years ago
3,015 stars

Top 16.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a GPT-2 based model for Chinese chit-chat, inspired by Microsoft's DialoGPT. It offers pre-trained models and code for training, preprocessing, and interacting with a Chinese conversational AI, targeting developers and researchers interested in Chinese dialogue systems.

How It Works

The model is built upon HuggingFace's transformers library, utilizing a GPT-2 architecture. For training, multi-turn dialogue data is concatenated with special separator tokens ([CLS], [SEP]) and fed into the model for autoregressive training. Generation employs Temperature, Top-k, and Nucleus Sampling techniques to control output diversity and quality. The implementation simplifies DialoGPT's MMI (Man-Machine Interaction) generation method for faster inference.

Quick Start & Requirements

  • Install: Clone the repository and install dependencies: pip install -r requirements.txt (requirements not explicitly listed, but transformers==4.2.0 and pytorch==1.7.0 are mentioned).
  • Prerequisites: Python 3.6+, PyTorch 1.7.0+.
  • Usage: Download pre-trained models from the "模型分享" section. Run interactive chat with python interact.py --model_path <path_to_model>. CPU inference is supported via --no_cuda.
  • Resources: Pre-trained models are available via Baidu Netdisk or Google Drive. Training requires significant computational resources.
  • Links: Model Sharing (password: ju6m), 50w Corpus (password: 4g5e), 100w Corpus (password: s908).

Highlighted Details

  • Implements MMI ideas from DialoGPT, simplified for speed.
  • Supports various generation sampling methods (Temperature, Top-k, Nucleus).
  • Includes detailed Chinese comments within the code.
  • Offers pre-processed datasets (50w, 100w multi-turn dialogues).
  • Provides pre-trained model weights (e.g., model_epoch40_50w).

Maintenance & Community

The project author has released several related Chinese NLP models, including Firefly (a conversational LLM), LLMPruner, OFA-Chinese, CLIP-Chinese, ClipCap-Chinese, and CPM Chinese text generation. No specific community links (Discord, Slack) are provided.

Licensing & Compatibility

The repository does not explicitly state a license. It references GPT2-Chinese and DialoGPT, which have their own licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project mentions a TODO for "多卡并行训练负载不均衡的问题" (imbalanced load in multi-GPU training). The specified dependencies (transformers==4.2.0, pytorch==1.7.0) are relatively old, which might lead to compatibility issues with newer libraries or hardware.

Health Check
Last commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Starred by Lukas Biewald Lukas Biewald(Cofounder of Weights & Biases), Patrick von Platen Patrick von Platen(Core Contributor to Hugging Face Transformers and Diffusers), and
1 more.

DialoGPT by microsoft

0.0%
2k
Response generation model via large-scale pretraining
created 6 years ago
updated 2 years ago
Feedback? Help us improve.