Chinese pre-trained transformer for language understanding and generation research
Top 64.0% on sourcepulse
CPT (Chinese Pre-trained Unbalanced Transformer) is a novel Transformer architecture designed for both Chinese Natural Language Understanding (NLU) and Generation (NLG) tasks. It offers a unified approach by employing a shared encoder with specialized decoders for each task, aiming to improve performance and efficiency for Chinese NLP applications.
How It Works
CPT utilizes an unbalanced Transformer architecture comprising a Shared Encoder (S-Enc) for common semantic representation, an Understanding Decoder (U-Dec) for NLU tasks, and a Generation Decoder (G-Dec) for NLG tasks. The S-Enc is a full Transformer encoder, while U-Dec is a shallow encoder, and G-Dec is a standard Transformer decoder. This design allows CPT to jointly pre-train with Masked Language Modeling (MLM) and Denoising Autoencoding (DAE), leveraging the strengths of both encoder-decoder and encoder-only models.
Quick Start & Requirements
pytorch==1.8.1
and transformers==4.4.1
. Users must manually import the modeling_cpt.py
file into their project.fnlp/cpt-base
, fnlp/cpt-large
).CPTForConditionalGeneration
.Highlighted Details
cpt-base
(10-layer S-Enc, 2-layer U-Dec/G-Dec) and cpt-large
(20-layer S-Enc, 4-layer U-Dec/G-Dec) variants.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project requires manual integration of the modeling_cpt.py
file, and the specific license is not clearly indicated, which may pose compatibility concerns for commercial use. The updated models may show slightly degraded performance on certain downstream tasks due to hyperparameter sensitivity and training dynamics.
2 years ago
Inactive