pytorch-original-transformer by gordicaleksa

PyTorch implementation of the original Transformer paper for learning

Created 5 years ago

1,081 stars

Top 35.1% on SourcePulse

Project Summary

This repository provides a PyTorch implementation of the original Transformer model (Vaswani et al.) for educational purposes, targeting engineers and researchers looking to understand and experiment with the architecture. It includes well-commented code, visualization tools for complex concepts like positional encodings and learning rate schedules, and pre-trained models for machine translation tasks.

How It Works

The implementation follows the "Attention Is All You Need" paper, leveraging self-attention and multi-head attention mechanisms to model long-range dependencies and enable parallelization. It uses custom data loading with caching for improved performance and includes visualizations for key components like positional encodings, learning rate schedules, and attention mechanisms, aiding comprehension.

Quick Start & Requirements

Install: Clone the repo, then run conda env create -f environment.yml and conda activate pytorch-transformer.
Prerequisites: PyTorch, SpaCy (models for English and German are downloaded automatically), Anaconda/Miniconda. GPU with CUDA is highly recommended for training.
Setup Time: Initial environment setup and model downloads may take a while.
Docs: [The Annotated Transformer ++.ipynb](The Annotated Transformer ++.ipynb)

Highlighted Details

Visualizations for positional encodings, learning rate schedules, and label smoothing.
Pre-trained models for IWSLT English-German and German-English translation.
Custom data loading wrapper for ~30x speedup over standard torchtext.
Attention visualization for encoder and decoder layers.

Maintenance & Community

The project was last updated in 2020.
Links to YouTube channel, LinkedIn, Twitter, and Medium are provided for further content.

Licensing & Compatibility

The repository does not explicitly state a license.

Limitations & Caveats

The original Transformer is not state-of-the-art.
Multi-GPU/multi-node training support is listed as a TODO.
Beam decoding is also a TODO.
BPE and shared source-target vocab are pending implementation.

Health Check

Last Commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

7 stars in the last 30 days

Explore Similar Projects

Starred by

Jiayi Pan

Jiayi Pan(Author of SWE-Gym; MTS at xAI),

Shizhe Diao

Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), and

1 more.

METER by zdou0830

Multimodal framework for vision-and-language transformer research

Created 4 years ago

Updated 3 years ago

transformers_zh_docs by liuzard

Chinese documentation for Hugging Face Transformers library

Created 2 years ago

Updated 2 years ago

Starred by

Andrew Kane

Andrew Kane(Author of pgvector).

text2text by artitw

Text2Text toolkit for language modeling tasks

Created 5 years ago

Updated 1 year ago

Starred by

Jesse Clark

Jesse Clark(Cofounder of Marqo),

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect), and

3 more.

magma by Aleph-Alpha-Research

Multimodal model for augmenting generative language models

Created 4 years ago

Updated 5 months ago

KoELECTRA by monologg

Pretrained ELECTRA model for Korean language tasks

Created 5 years ago

Updated 1 year ago

Starred by

James Bradbury

James Bradbury(Head of Compute at Anthropic),

Amanpreet Singh

Amanpreet Singh(Cofounder of Contextual AI), and

1 more.

translate by pytorch

PyTorch library for sequence-to-sequence translation

Created 7 years ago

Updated 2 years ago

Starred by

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI) and

Philipp Schmid

Philipp Schmid(DevRel at Google DeepMind).

transformers-tutorials by abhimishra91

Tutorials for fine-tuning transformer models on NLP tasks

Created 5 years ago

Updated 1 year ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind) and

Thomas Wolf

Thomas Wolf(Cofounder of Hugging Face).

joeynmt by joeynmt

Minimalist NMT toolkit for educational purposes

Created 7 years ago

Updated 1 year ago

NLP-Tutorials by MorvanZhou

NLP tutorial with simple implementations of models

Created 7 years ago

Updated 2 years ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

10 more.

open_flamingo by mlfoundations

Open-source framework for training large multimodal models

Created 3 years ago

Updated 1 year ago

Starred by

Boris Cherny

Boris Cherny(Creator of Claude Code; MTS at Anthropic),

Andrey Vasnetsov

Andrey Vasnetsov(Cofounder of Qdrant), and

20 more.

fairseq-lua by facebookresearch

Lua-based toolkit for sequence-to-sequence learning

Created 8 years ago

Updated 4 years ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai),

Alex Cheema

Alex Cheema(Cofounder of EXO Labs), and

22 more.

unilm by microsoft

Foundation models for language, vision, speech, and multimodal tasks

Created 6 years ago

Updated 3 weeks ago

Feedback? Help us improve.