nGPT-pytorch by lucidrains

PyTorch implementation of nGPT, a normalized GPT learning on the hypersphere

Created 1 year ago

294 stars

Top 90.1% on SourcePulse

Project Summary

This repository provides a PyTorch implementation of nGPT (normalized GPT), a Transformer variant that learns on the hypersphere. It aims to improve Transformer performance, particularly in areas like continual learning and reinforcement learning, by normalizing attention queries and keys. The project is suitable for researchers and practitioners interested in exploring novel Transformer architectures.

How It Works

nGPT modifies the standard Transformer architecture by incorporating normalized attention mechanisms, specifically normalizing the query and key vectors before the attention calculation. This approach, inspired by cosine similarity and hypersphere learning, aims to improve stability and potentially enhance expressivity by constraining representations. The implementation is a direct translation of the concepts presented in the associated research paper.

Quick Start & Requirements

Install: pip install nGPT-pytorch
Requirements: PyTorch, standard Python libraries.
Usage: See the provided Python code snippet in the README for a basic model instantiation and forward pass.
Testing: python train.py for Enwik8 dataset.

Highlighted Details

Implements nGPT, a normalized Transformer architecture.
Focuses on learning representations on the hypersphere.
Includes attn_norm_qk parameter for enabling normalized attention.
Citations provided for related research on normalized Transformers and value residual learning.

Maintenance & Community

The project is associated with the author "lucidrains," known for various PyTorch implementations of recent deep learning models. No specific community channels or roadmap are detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the author's typical practices and the nature of such implementations, it is likely MIT or a similar permissive license, but this should be verified. Compatibility with commercial or closed-source projects is assumed to be high if it is indeed MIT licensed.

Limitations & Caveats

The implementation is described as a "quick implementation," suggesting it may not include all optimizations or features of a production-ready library. The README raises a question about potential loss of expressivity due to the normalization, which warrants further investigation by users.

nGPT-pytorch by lucidrains

Explore Similar Projects

awesome-in-context-rl by dunnolab

min-decision-transformer by nikhilbarhate99

Reinforcement-Learning-Papers by yingchengyang

pytorch-es by atgambardella

machina by DeepX-inc

imitation-learning by Kaixhin

personal_chatgpt by chunhuizhang

awesome-deep-rl by tigerneil

flash-linear-attention by fla-org

DeepLearning_LHY21_Notes by unclestrong

Reinforcement-Learning by andri27-ts

nn-zero-to-hero by karpathy