xlnet by zihangdai

Language model research paper using generalized autoregressive pretraining

Created 6 years ago

6,173 stars

Top 8.2% on SourcePulse

View on GitHub

13 Experts Love This Project

Andrew Kane

Author of pgvector

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

Evan Hubinger

Head of Alignment Stress-Testing at Anthropic

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

and 9 more!

Project Summary

XLNet is a generalized autoregressive pretraining method for language understanding, offering state-of-the-art performance on tasks like question answering, natural language inference, and sentiment analysis. It is designed for researchers and practitioners in NLP seeking advanced language representation models.

How It Works

XLNet utilizes a novel generalized permutation language modeling objective and the Transformer-XL architecture. This approach allows it to capture bidirectional context while retaining autoregressive properties, overcoming limitations of previous models like BERT by avoiding the artificial independence assumptions between predicted tokens. Transformer-XL's recurrence mechanism enables processing of longer contexts efficiently.

Quick Start & Requirements

Installation: Primarily uses TensorFlow 1.13.1 and Python 2. Pre-trained models are available as TensorFlow checkpoints.
Dependencies: Requires TensorFlow, SentencePiece, and potentially CUDA for GPU acceleration. Specific fine-tuning scripts are provided for TensorFlow 1.x.
Resources: XLNet-Large requires significant GPU memory (e.g., 16GB+ for a single sequence of length 512) or TPUs for efficient fine-tuning and reproduction of SOTA results. Multiple GPUs (32-128) are recommended for large models.
Links: Paper, Google Groups

Highlighted Details

Outperforms BERT on 20 tasks, achieving SOTA on 18 as of June 2019.
Provides both XLNet-Base and XLNet-Large pre-trained models (cased versions).
Includes fine-tuning scripts for common NLP tasks (classification, SQuAD, RACE) and an abstraction layer for custom usage.
Detailed guidance on data preprocessing and pre-training is available.

Maintenance & Community

Initial release in June 2019, with updates including XLNet-Base in July 2019.
Community updates and announcements are shared via a Google Groups forum.

Licensing & Compatibility

The repository itself does not explicitly state a license in the README. The pre-trained models are released for research purposes.

Limitations & Caveats

The code is tested with TensorFlow 1.13.1 and Python 2, which are outdated.
Reproducing SOTA results, especially for XLNet-Large, is memory-intensive and often requires TPUs or a large number of GPUs.
Gradient accumulation is mentioned as an experimental feature to alleviate memory issues.

Health Check

Last Commit

2 years ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days