roformer  by ZhuiyiTechnology

MLM pre-trained language model using rotary position embedding (RoPE)

Created 4 years ago
1,023 stars

Top 36.6% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides RoFormer, a Masked Language Model (MLM) pre-trained with Rotary Position Embedding (RoPE). RoFormer is designed for researchers and practitioners in Natural Language Processing (NLP) seeking to leverage advanced relative positional encoding for improved Transformer performance. The key benefit is RoPE's theoretical properties and its unique compatibility with linear attention mechanisms.

How It Works

RoFormer integrates Rotary Position Embedding (RoPE) into the Transformer architecture. RoPE applies rotation matrices to query and key embeddings based on their absolute positions. This clever mathematical formulation ensures that the attention scores depend solely on the relative positions of tokens, a significant advantage over absolute positional encodings. This approach is also noted as the only relative position embedding method compatible with linear attention.

Quick Start & Requirements

  • Install: pip install bert4keras==0.10.4
  • Prerequisites: TensorFlow. Pre-trained models are available for download.
  • Links: Paper, EleutherAI Blog

Highlighted Details

  • Implements Rotary Position Embedding (RoPE) for relative positional encoding.
  • RoPE is theoretically sound and compatible with linear attention.
  • Offers pre-trained models for Chinese language tasks.
  • Pseudo-code and bert4keras implementation provided.

Maintenance & Community

  • The primary author is Jianlin Su.
  • A PyTorch implementation is available via x-transformer.

Licensing & Compatibility

  • The repository itself does not explicitly state a license. The associated paper is available on arXiv.

Limitations & Caveats

The project primarily focuses on Chinese language models and relies on the bert4keras library, which may limit broader adoption without additional integration efforts. The licensing status of the repository code is not clearly defined in the README.

Health Check
Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
14 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.