Chinese-XLNet by ymcui

Chinese XLNet pre-trained models for NLP tasks

Created 6 years ago

1,650 stars

Top 25.4% on SourcePulse

Project Summary

This repository provides pre-trained XLNet models for Chinese natural language processing, aiming to enrich the Chinese NLP ecosystem with diverse model options. It is targeted at researchers and practitioners in Chinese NLP who need robust language models for various downstream tasks.

How It Works

The project offers two Chinese XLNet models: XLNet-mid (24 layers, 768 hidden size, 12 heads, 209M parameters) and XLNet-base (12 layers, 768 hidden size, 12 heads, 117M parameters). These models are trained on a large corpus of Chinese data (5.4B tokens), including Wikipedia and general domain data. The training process follows the official XLNet methodology, utilizing SentencePiece for tokenization and generating TFRecords for training.

Quick Start & Requirements

Installation: Models can be loaded via the Huggingface Transformers library.

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("hfl/chinese-xlnet-mid")
model = AutoModel.from_pretrained("hfl/chinese-xlnet-mid")

Prerequisites: Python, Huggingface Transformers (version 2.2.2 or later).
Model Downloads: Pre-trained weights are available via Google Drive and Baidu Netdisk. PyTorch versions can be converted or downloaded from Huggingface.
Resources: XLNet-mid model files are approximately 800MB.

Highlighted Details

Achieves competitive results on Chinese NLP benchmarks like CMRC 2018 (Reading Comprehension) and DRCD (Traditional Chinese Reading Comprehension), outperforming BERT variants in some cases.
Provides detailed pre-training and fine-tuning configurations, including commands for data preparation, training, and task-specific fine-tuning on CMRC 2018, DRCD, and ChnSentiCorp.
The project is based on the official CMU/Google XLNet implementation.
A technical report detailing the models and their performance is available on arXiv.

Maintenance & Community

Developed by Harbin Institute of Technology (HIT) and iFlytek Joint Laboratory (HFL).
The project is supported by the Google TensorFlow Research Cloud (TFRC) program.
Issues and contributions can be submitted via GitHub Issues and Pull Requests.

Licensing & Compatibility

The models are available for technical research reference and can be used within the license terms.
The project is not an official product of XLNet or iFlytek.

Limitations & Caveats

The pre-training dataset is not publicly available due to copyright issues.
The project does not guarantee the release of larger models, only if significant performance improvements are achieved.
Users experiencing poor performance on specific datasets are advised to continue pre-training on their own data or use alternative models.

Health Check

Last Commit

6 months ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

nlp_notes by YangBin1729

NLP notes for ML/DL principles, examples, and model deployment

Created 6 years ago

Updated 5 years ago

Unilm by YunwenTechnology

Chinese UniLM base model for NLU and NLG tasks

Created 5 years ago

Updated 3 years ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI) and

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

awesome-transformer-nlp by cedrickchee

Curated list of NLP resources for Transformer networks

Created 7 years ago

Updated 1 year ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory).

Chinese-ELECTRA by ymcui

Chinese ELECTRA pre-trained language models

Created 5 years ago

Updated 6 months ago

BERT-keras by Separius

Keras implementation for BERT and Transformer LM research

Created 7 years ago

Updated 6 years ago

Bert-Multi-Label-Text-Classification by lonePatient

PyTorch code for multi-label text classification

Created 7 years ago

Updated 2 years ago

lightNLP by smilelight

NLP deep learning framework using PyTorch and Torchtext

Created 7 years ago

Updated 5 years ago

zero_nlp by yuanzhoulvpi2017

NLP solution for Chinese language models, data, training, and inference

Created 2 years ago

Updated 5 months ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI) and

Andrew Kane

Andrew Kane(Author of pgvector).

NLP-Models-Tensorflow by mesolitica

TensorFlow deep learning models for NLP problems

Created 7 years ago

Updated 5 years ago

Chinese-BERT-wwm by ymcui

Pre-trained language models for Chinese NLP tasks

Created 6 years ago

Updated 6 months ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai),

Alex Cheema

Alex Cheema(Cofounder of EXO Labs), and

22 more.

unilm by microsoft

Foundation models for language, vision, speech, and multimodal tasks

Created 6 years ago

Updated 3 weeks ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI), and

29 more.

bert by google-research

TensorFlow code and pre-trained models for BERT

Created 7 years ago

Updated 1 year ago

Feedback? Help us improve.