pytorch-original-transformer  by gordicaleksa

PyTorch implementation of the original Transformer paper for learning

Created 4 years ago
1,044 stars

Top 36.0% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of the original Transformer model (Vaswani et al.) for educational purposes, targeting engineers and researchers looking to understand and experiment with the architecture. It includes well-commented code, visualization tools for complex concepts like positional encodings and learning rate schedules, and pre-trained models for machine translation tasks.

How It Works

The implementation follows the "Attention Is All You Need" paper, leveraging self-attention and multi-head attention mechanisms to model long-range dependencies and enable parallelization. It uses custom data loading with caching for improved performance and includes visualizations for key components like positional encodings, learning rate schedules, and attention mechanisms, aiding comprehension.

Quick Start & Requirements

  • Install: Clone the repo, then run conda env create -f environment.yml and conda activate pytorch-transformer.
  • Prerequisites: PyTorch, SpaCy (models for English and German are downloaded automatically), Anaconda/Miniconda. GPU with CUDA is highly recommended for training.
  • Setup Time: Initial environment setup and model downloads may take a while.
  • Docs: [The Annotated Transformer ++.ipynb](The Annotated Transformer ++.ipynb)

Highlighted Details

  • Visualizations for positional encodings, learning rate schedules, and label smoothing.
  • Pre-trained models for IWSLT English-German and German-English translation.
  • Custom data loading wrapper for ~30x speedup over standard torchtext.
  • Attention visualization for encoder and decoder layers.

Maintenance & Community

  • The project was last updated in 2020.
  • Links to YouTube channel, LinkedIn, Twitter, and Medium are provided for further content.

Licensing & Compatibility

  • The repository does not explicitly state a license.

Limitations & Caveats

  • The original Transformer is not state-of-the-art.
  • Multi-GPU/multi-node training support is listed as a TODO.
  • Beam decoding is also a TODO.
  • BPE and shared source-target vocab are pending implementation.
Health Check
Last Commit

4 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and
1 more.

METER by zdou0830

0%
373
Multimodal framework for vision-and-language transformer research
Created 3 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Wing Lian Wing Lian(Founder of Axolotl AI), and
10 more.

open_flamingo by mlfoundations

0.1%
4k
Open-source framework for training large multimodal models
Created 2 years ago
Updated 1 year ago
Feedback? Help us improve.