a-PyTorch-Tutorial-to-Transformers by sgrvinod

PyTorch tutorial for Transformer model implementation

Created 6 years ago

361 stars

Top 77.8% on SourcePulse

Project Summary

This repository provides a PyTorch implementation and tutorial for the Transformer model, as introduced in the "Attention Is All You Need" paper. It's designed for researchers and practitioners looking to understand and build sequence-to-sequence models, particularly for machine translation, but applicable to other domains like computer vision. The tutorial breaks down the core concepts and implementation details, enabling users to replicate and adapt the architecture.

How It Works

The implementation follows the original Transformer architecture, leveraging multi-head scaled dot-product attention as the core mechanism. This attention allows tokens to weigh the importance of other tokens in the sequence, enabling parallel processing and capturing long-range dependencies, which is a significant advantage over traditional RNNs. The model consists of an encoder to process the input sequence and a decoder to generate the output sequence, with positional embeddings used to inject sequence order information.

Quick Start & Requirements

Install/Run: python prepare_data.py followed by python train.py for training, or python translate.py for inference.
Prerequisites: Python 3.6+, PyTorch 1.4+. Requires downloading WMT14 English-German translation datasets.
Setup: Data preparation involves downloading and processing ~4.5 million sentence pairs. Training requires significant computational resources (e.g., a single RTX 2080Ti GPU with gradient accumulation for extended periods).
Links: Official Repository

Highlighted Details

Detailed explanation and implementation of core Transformer components: attention, positional embeddings, encoder-decoder structure.
Uses Byte Pair Encoding (BPE) for subword tokenization, enabling handling of unseen words.
Includes a custom SequenceLoader for efficient batching based on target sequence length.
Provides visualizations of attention patterns across different layers and heads.

Maintenance & Community

The repository is authored by sgrvinod. Issues can be posted on the GitHub repository for questions, suggestions, or corrections.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Users should verify licensing for commercial or closed-source use.

Limitations & Caveats

The implementation uses PyTorch 1.4 and Python 3.6, which are older versions. Some implementation details differ from the original paper, often following the official implementation's choices or incorporating improvements not detailed in the paper. The provided BLEU score (26.49) is slightly lower than the paper's reported 27.3.

a-PyTorch-Tutorial-to-Transformers by sgrvinod

Explore Similar Projects

nlp_made_easy by Kyubyong

Optimus by ChunyuanLI

LMkor by kiyoungkim1

Unilm by YunwenTechnology

vec2text by vec2text

NLP-BERT--ChineseVersion by Y1ran

How-to-use-Transformers by jsksxs360

reformer-pytorch by lucidrains

NLP-Tutorials by MorvanZhou

DL4NLP by andrewt3000

unilm by microsoft

minGPT by karpathy