roformer by ZhuiyiTechnology

MLM pre-trained language model using rotary position embedding (RoPE)

Created 4 years ago

1,069 stars

Top 35.4% on SourcePulse

View on GitHub

1 Expert Loves This Project

Phil Wang

Prolific Research Paper Implementer

Project Summary

This repository provides RoFormer, a Masked Language Model (MLM) pre-trained with Rotary Position Embedding (RoPE). RoFormer is designed for researchers and practitioners in Natural Language Processing (NLP) seeking to leverage advanced relative positional encoding for improved Transformer performance. The key benefit is RoPE's theoretical properties and its unique compatibility with linear attention mechanisms.

How It Works

RoFormer integrates Rotary Position Embedding (RoPE) into the Transformer architecture. RoPE applies rotation matrices to query and key embeddings based on their absolute positions. This clever mathematical formulation ensures that the attention scores depend solely on the relative positions of tokens, a significant advantage over absolute positional encodings. This approach is also noted as the only relative position embedding method compatible with linear attention.

Quick Start & Requirements

Install: pip install bert4keras==0.10.4
Prerequisites: TensorFlow. Pre-trained models are available for download.
Links: Paper, EleutherAI Blog

Highlighted Details

Implements Rotary Position Embedding (RoPE) for relative positional encoding.
RoPE is theoretically sound and compatible with linear attention.
Offers pre-trained models for Chinese language tasks.
Pseudo-code and bert4keras implementation provided.

Maintenance & Community

The primary author is Jianlin Su.
A PyTorch implementation is available via x-transformer.

Licensing & Compatibility

The repository itself does not explicitly state a license. The associated paper is available on arXiv.

Limitations & Caveats

The project primarily focuses on Chinese language models and relies on the bert4keras library, which may limit broader adoption without additional integration efforts. The licensing status of the repository code is not clearly defined in the README.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

10 stars in the last 30 days