CPT  by fastnlp

Chinese pre-trained transformer for language understanding and generation research

created 3 years ago
489 stars

Top 64.0% on sourcepulse

GitHubView on GitHub
Project Summary

CPT (Chinese Pre-trained Unbalanced Transformer) is a novel Transformer architecture designed for both Chinese Natural Language Understanding (NLU) and Generation (NLG) tasks. It offers a unified approach by employing a shared encoder with specialized decoders for each task, aiming to improve performance and efficiency for Chinese NLP applications.

How It Works

CPT utilizes an unbalanced Transformer architecture comprising a Shared Encoder (S-Enc) for common semantic representation, an Understanding Decoder (U-Dec) for NLU tasks, and a Generation Decoder (G-Dec) for NLG tasks. The S-Enc is a full Transformer encoder, while U-Dec is a shallow encoder, and G-Dec is a standard Transformer decoder. This design allows CPT to jointly pre-train with Masked Language Modeling (MLM) and Denoising Autoencoding (DAE), leveraging the strengths of both encoder-decoder and encoder-only models.

Quick Start & Requirements

  • Install: Requires pytorch==1.8.1 and transformers==4.4.1. Users must manually import the modeling_cpt.py file into their project.
  • Usage: Load pre-trained models via Huggingface Transformers (e.g., fnlp/cpt-base, fnlp/cpt-large).
  • Example: Provided for text generation using CPTForConditionalGeneration.
  • Resources: Pre-trained checkpoints are available for download.

Highlighted Details

  • Offers both cpt-base (10-layer S-Enc, 2-layer U-Dec/G-Dec) and cpt-large (20-layer S-Enc, 4-layer U-Dec/G-Dec) variants.
  • Includes pre-trained weights for Chinese BART as a comparison baseline.
  • Updated models feature an expanded vocabulary (51271 tokens) and increased positional embedding (1024).
  • Achieves competitive performance on Chinese NLU and NLG benchmarks.

Maintenance & Community

  • Contact: yfshao@fudan.edu.cn for issues.
  • Pre-training and fine-tuning code are available separately.

Licensing & Compatibility

  • The repository does not explicitly state a license.
  • Compatible with Huggingface Transformers.

Limitations & Caveats

The project requires manual integration of the modeling_cpt.py file, and the specific license is not clearly indicated, which may pose compatibility concerns for commercial use. The updated models may show slightly degraded performance on certain downstream tasks due to hyperparameter sensitivity and training dynamics.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.