x-transformers by lucidrains

Transformer library with extensive experimental features

Created 5 years ago

5,766 stars

Top 8.7% on SourcePulse

View on GitHub

12 Experts Love This Project

Chip Huyen

Author of "AI Engineering", "Designing Machine Learning Systems"

Jeff Hammerbacher

Cofounder of Cloudera

Chenlin Meng

Cofounder of Pika

Jiayi Pan

Author of SWE-Gym; MTS at xAI

and 8 more!

Project Summary

This repository provides a highly modular and configurable implementation of the Transformer architecture, catering to researchers and practitioners seeking to experiment with state-of-the-art variations. It offers a comprehensive suite of attention mechanisms, normalization techniques, and architectural modifications, enabling the construction of diverse Transformer models for NLP and vision tasks.

How It Works

The library implements Transformer models using a flexible wrapper-based design. Users can instantiate core components like Encoder, Decoder, and XTransformer (encoder-decoder), then customize them with numerous parameters that enable features such as Flash Attention, Rotary Positional Embeddings, ALiBi, various normalization schemes (RMSNorm, ScaleNorm, LayerNorm variants), GLU feedforwards, and more. This modularity allows for fine-grained control over architectural choices, facilitating rapid prototyping and empirical study of Transformer variants.

Quick Start & Requirements

Install via pip: pip install x-transformers
Requires PyTorch.
GPU with CUDA is recommended for performance.

Highlighted Details

Extensive support for experimental Transformer features from recent research papers.
Includes implementations for vision tasks (e.g., SimpleViT, PaLI).
Offers specialized wrappers for Transformer-XL recurrence and continuous embeddings.
Integrates Flash Attention for significant speed and memory improvements.

Maintenance & Community

The project is actively maintained by lucidrains, with contributions from the broader AI research community. Links to relevant papers and discussions are often included within the code and README.

Licensing & Compatibility

The repository is typically released under a permissive license (e.g., MIT), allowing for broad use in research and commercial applications.

Limitations & Caveats

The sheer number of configurable options can lead to a steep learning curve. Some experimental features may be less stable or require specific hyperparameter tuning. The README is dense with code examples and research paper references, requiring a solid understanding of Transformer architectures to fully leverage.

Health Check

Last Commit

4 days ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

62 stars in the last 30 days