xlnet  by zihangdai

Language model research paper using generalized autoregressive pretraining

created 6 years ago
6,182 stars

Top 8.5% on sourcepulse

GitHubView on GitHub
Project Summary

XLNet is a generalized autoregressive pretraining method for language understanding, offering state-of-the-art performance on tasks like question answering, natural language inference, and sentiment analysis. It is designed for researchers and practitioners in NLP seeking advanced language representation models.

How It Works

XLNet utilizes a novel generalized permutation language modeling objective and the Transformer-XL architecture. This approach allows it to capture bidirectional context while retaining autoregressive properties, overcoming limitations of previous models like BERT by avoiding the artificial independence assumptions between predicted tokens. Transformer-XL's recurrence mechanism enables processing of longer contexts efficiently.

Quick Start & Requirements

  • Installation: Primarily uses TensorFlow 1.13.1 and Python 2. Pre-trained models are available as TensorFlow checkpoints.
  • Dependencies: Requires TensorFlow, SentencePiece, and potentially CUDA for GPU acceleration. Specific fine-tuning scripts are provided for TensorFlow 1.x.
  • Resources: XLNet-Large requires significant GPU memory (e.g., 16GB+ for a single sequence of length 512) or TPUs for efficient fine-tuning and reproduction of SOTA results. Multiple GPUs (32-128) are recommended for large models.
  • Links: Paper, Google Groups

Highlighted Details

  • Outperforms BERT on 20 tasks, achieving SOTA on 18 as of June 2019.
  • Provides both XLNet-Base and XLNet-Large pre-trained models (cased versions).
  • Includes fine-tuning scripts for common NLP tasks (classification, SQuAD, RACE) and an abstraction layer for custom usage.
  • Detailed guidance on data preprocessing and pre-training is available.

Maintenance & Community

  • Initial release in June 2019, with updates including XLNet-Base in July 2019.
  • Community updates and announcements are shared via a Google Groups forum.

Licensing & Compatibility

  • The repository itself does not explicitly state a license in the README. The pre-trained models are released for research purposes.

Limitations & Caveats

  • The code is tested with TensorFlow 1.13.1 and Python 2, which are outdated.
  • Reproducing SOTA results, especially for XLNet-Large, is memory-intensive and often requires TPUs or a large number of GPUs.
  • Gradient accumulation is mentioned as an experimental feature to alleviate memory issues.
Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
9 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.