nGPT-pytorch  by lucidrains

PyTorch implementation of nGPT, a normalized GPT learning on the hypersphere

created 9 months ago
288 stars

Top 92.1% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a PyTorch implementation of nGPT (normalized GPT), a Transformer variant that learns on the hypersphere. It aims to improve Transformer performance, particularly in areas like continual learning and reinforcement learning, by normalizing attention queries and keys. The project is suitable for researchers and practitioners interested in exploring novel Transformer architectures.

How It Works

nGPT modifies the standard Transformer architecture by incorporating normalized attention mechanisms, specifically normalizing the query and key vectors before the attention calculation. This approach, inspired by cosine similarity and hypersphere learning, aims to improve stability and potentially enhance expressivity by constraining representations. The implementation is a direct translation of the concepts presented in the associated research paper.

Quick Start & Requirements

  • Install: pip install nGPT-pytorch
  • Requirements: PyTorch, standard Python libraries.
  • Usage: See the provided Python code snippet in the README for a basic model instantiation and forward pass.
  • Testing: python train.py for Enwik8 dataset.

Highlighted Details

  • Implements nGPT, a normalized Transformer architecture.
  • Focuses on learning representations on the hypersphere.
  • Includes attn_norm_qk parameter for enabling normalized attention.
  • Citations provided for related research on normalized Transformers and value residual learning.

Maintenance & Community

The project is associated with the author "lucidrains," known for various PyTorch implementations of recent deep learning models. No specific community channels or roadmap are detailed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Given the author's typical practices and the nature of such implementations, it is likely MIT or a similar permissive license, but this should be verified. Compatibility with commercial or closed-source projects is assumed to be high if it is indeed MIT licensed.

Limitations & Caveats

The implementation is described as a "quick implementation," suggesting it may not include all optimizations or features of a production-ready library. The README raises a question about potential loss of expressivity due to the normalization, which warrants further investigation by users.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
6 more.

x-transformers by lucidrains

0.2%
5k
Transformer library with extensive experimental features
created 4 years ago
updated 3 days ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Phil Wang Phil Wang(Prolific Research Paper Implementer), and
4 more.

vit-pytorch by lucidrains

0.2%
24k
PyTorch library for Vision Transformer variants and related techniques
created 4 years ago
updated 6 days ago
Feedback? Help us improve.