TUPE by guolinke

Transformer research paper exploring positional encoding

Created 5 years ago

253 stars

Top 99.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Shizhe Diao

Author of LMFlow; Research Scientist at NVIDIA

Project Summary

TUPE (Transformer with Untied Positional Encoding) offers an improved positional encoding mechanism for Transformer-based language models, aiming to enhance performance and reduce pre-training costs. It is designed for researchers and practitioners working with large language models who want to experiment with alternative positional encoding strategies.

How It Works

TUPE replaces fixed or relative positional encodings with learned, untied embeddings for each position. This approach allows the model to learn positional information more flexibly, potentially capturing complex dependencies. The implementation modifies core fairseq modules (transformer_sentence_encoder.py and multihead_attention.py) to integrate these untied embeddings.

Quick Start & Requirements

Installation: pip install --editable . after cloning the repository.
Prerequisites: PyTorch, Python >= 3.5, NVIDIA's Apex library with CUDA extensions. NCCL is recommended for multi-node training.
Data Pre-processing: Requires mosesdecoder (via git submodule) and specific scripts for pre-training and downstream data.
Resources: Pre-training example uses 16 V100 GPUs. Fine-tuning examples are provided.
Docs: fairseq

Highlighted Details

Claims to outperform baselines on the GLUE benchmark with 30% less pre-training cost.
Compatible with larger models like RoBERTa, ELECTRA, and UniLM.
Simple modification allows easy integration into existing Transformer architectures.
Supports mixed-precision training via NVIDIA Apex.

Maintenance & Community

The project is associated with the paper "Rethinking Positional Encoding in Language Pre-training" (ICLR 2021).
No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

The repository is built on fairseq, which is typically MIT licensed. However, the README does not explicitly state the license for the TUPE code itself.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions that the implementation is verified using BERT-Base due to limited computational resources, implying that performance on larger models might require further validation. The dependency on NVIDIA's Apex library with specific CUDA extensions might pose installation challenges.

Health Check

Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days