CPT by fastnlp

Chinese pre-trained transformer for language understanding and generation research

Created 4 years ago

494 stars

Top 62.7% on SourcePulse

Project Summary

CPT (Chinese Pre-trained Unbalanced Transformer) is a novel Transformer architecture designed for both Chinese Natural Language Understanding (NLU) and Generation (NLG) tasks. It offers a unified approach by employing a shared encoder with specialized decoders for each task, aiming to improve performance and efficiency for Chinese NLP applications.

How It Works

CPT utilizes an unbalanced Transformer architecture comprising a Shared Encoder (S-Enc) for common semantic representation, an Understanding Decoder (U-Dec) for NLU tasks, and a Generation Decoder (G-Dec) for NLG tasks. The S-Enc is a full Transformer encoder, while U-Dec is a shallow encoder, and G-Dec is a standard Transformer decoder. This design allows CPT to jointly pre-train with Masked Language Modeling (MLM) and Denoising Autoencoding (DAE), leveraging the strengths of both encoder-decoder and encoder-only models.

Quick Start & Requirements

Install: Requires pytorch==1.8.1 and transformers==4.4.1. Users must manually import the modeling_cpt.py file into their project.
Usage: Load pre-trained models via Huggingface Transformers (e.g., fnlp/cpt-base, fnlp/cpt-large).
Example: Provided for text generation using CPTForConditionalGeneration.
Resources: Pre-trained checkpoints are available for download.

Highlighted Details

Offers both cpt-base (10-layer S-Enc, 2-layer U-Dec/G-Dec) and cpt-large (20-layer S-Enc, 4-layer U-Dec/G-Dec) variants.
Includes pre-trained weights for Chinese BART as a comparison baseline.
Updated models feature an expanded vocabulary (51271 tokens) and increased positional embedding (1024).
Achieves competitive performance on Chinese NLU and NLG benchmarks.

Maintenance & Community

Contact: yfshao@fudan.edu.cn for issues.
Pre-training and fine-tuning code are available separately.

Licensing & Compatibility

The repository does not explicitly state a license.
Compatible with Huggingface Transformers.

Limitations & Caveats

The project requires manual integration of the modeling_cpt.py file, and the specific license is not clearly indicated, which may pose compatibility concerns for commercial use. The updated models may show slightly degraded performance on certain downstream tasks due to hyperparameter sensitivity and training dynamics.

CPT by fastnlp

Explore Similar Projects

PERT by ymcui

keras-transformer by kpot

KoELECTRA by monologg

Mastering-Transformers by PacktPublishing

ChatLM-mini-Chinese by charent

setfit by huggingface

gpt2-ml by imcaspar

GLM by THUDM

Chinese-BERT-wwm by ymcui

GPT2-Chinese by Morizeyao

unilm by microsoft

bert by google-research