TUPE  by guolinke

Transformer research paper exploring positional encoding

created 5 years ago
251 stars

Top 99.8% on sourcepulse

GitHubView on GitHub
Project Summary

TUPE (Transformer with Untied Positional Encoding) offers an improved positional encoding mechanism for Transformer-based language models, aiming to enhance performance and reduce pre-training costs. It is designed for researchers and practitioners working with large language models who want to experiment with alternative positional encoding strategies.

How It Works

TUPE replaces fixed or relative positional encodings with learned, untied embeddings for each position. This approach allows the model to learn positional information more flexibly, potentially capturing complex dependencies. The implementation modifies core fairseq modules (transformer_sentence_encoder.py and multihead_attention.py) to integrate these untied embeddings.

Quick Start & Requirements

  • Installation: pip install --editable . after cloning the repository.
  • Prerequisites: PyTorch, Python >= 3.5, NVIDIA's Apex library with CUDA extensions. NCCL is recommended for multi-node training.
  • Data Pre-processing: Requires mosesdecoder (via git submodule) and specific scripts for pre-training and downstream data.
  • Resources: Pre-training example uses 16 V100 GPUs. Fine-tuning examples are provided.
  • Docs: fairseq

Highlighted Details

  • Claims to outperform baselines on the GLUE benchmark with 30% less pre-training cost.
  • Compatible with larger models like RoBERTa, ELECTRA, and UniLM.
  • Simple modification allows easy integration into existing Transformer architectures.
  • Supports mixed-precision training via NVIDIA Apex.

Maintenance & Community

  • The project is associated with the paper "Rethinking Positional Encoding in Language Pre-training" (ICLR 2021).
  • No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

  • The repository is built on fairseq, which is typically MIT licensed. However, the README does not explicitly state the license for the TUPE code itself.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions that the implementation is verified using BERT-Base due to limited computational resources, implying that performance on larger models might require further validation. The dependency on NVIDIA's Apex library with specific CUDA extensions might pose installation challenges.

Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
created 3 years ago
updated 7 months ago
Feedback? Help us improve.