pytorch-original-transformer  by gordicaleksa

PyTorch implementation of the original Transformer paper for learning

created 4 years ago
1,037 stars

Top 36.8% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of the original Transformer model (Vaswani et al.) for educational purposes, targeting engineers and researchers looking to understand and experiment with the architecture. It includes well-commented code, visualization tools for complex concepts like positional encodings and learning rate schedules, and pre-trained models for machine translation tasks.

How It Works

The implementation follows the "Attention Is All You Need" paper, leveraging self-attention and multi-head attention mechanisms to model long-range dependencies and enable parallelization. It uses custom data loading with caching for improved performance and includes visualizations for key components like positional encodings, learning rate schedules, and attention mechanisms, aiding comprehension.

Quick Start & Requirements

  • Install: Clone the repo, then run conda env create -f environment.yml and conda activate pytorch-transformer.
  • Prerequisites: PyTorch, SpaCy (models for English and German are downloaded automatically), Anaconda/Miniconda. GPU with CUDA is highly recommended for training.
  • Setup Time: Initial environment setup and model downloads may take a while.
  • Docs: [The Annotated Transformer ++.ipynb](The Annotated Transformer ++.ipynb)

Highlighted Details

  • Visualizations for positional encodings, learning rate schedules, and label smoothing.
  • Pre-trained models for IWSLT English-German and German-English translation.
  • Custom data loading wrapper for ~30x speedup over standard torchtext.
  • Attention visualization for encoder and decoder layers.

Maintenance & Community

  • The project was last updated in 2020.
  • Links to YouTube channel, LinkedIn, Twitter, and Medium are provided for further content.

Licensing & Compatibility

  • The repository does not explicitly state a license.

Limitations & Caveats

  • The original Transformer is not state-of-the-art.
  • Multi-GPU/multi-node training support is listed as a TODO.
  • Beam decoding is also a TODO.
  • BPE and shared source-target vocab are pending implementation.
Health Check
Last commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.