xlstm by NX-AI

Recurrent neural network architecture based on the original LSTM

Created 1 year ago

2,087 stars

Top 21.2% on SourcePulse

View on GitHub

4 Experts Love This Project

Ross Wightman

Author of timm; CV at Hugging Face

Jiaming Song

Chief Scientist at Luma AI

Luis Capelo

Cofounder of Lightning AI

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

xLSTM introduces a novel Recurrent Neural Network architecture designed to overcome the limitations of traditional LSTMs, offering competitive performance against Transformers and State Space Models, particularly in language modeling. It targets researchers and developers seeking efficient and powerful sequence modeling capabilities.

How It Works

xLSTM combines an "Exponential Gating" mechanism with a "Matrix Memory" and stabilization techniques. This approach aims to enhance long-range dependency modeling and computational efficiency. The architecture allows for flexible integration of its core components (mLSTM and sLSTM) in various configurations, enabling tailored performance for different tasks.

Quick Start & Requirements

Installation: pip install xlstm (or pip install mlstm_kernels followed by pip install xlstm for the 7B model). A conda environment file (environment_pt240cu124.yaml) is provided for a tested setup.
Prerequisites: PyTorch (>=1.8), CUDA (>=12.4 recommended for optimized kernels). NVIDIA GPUs with Compute Capability >= 8.0 are required for the optimized CUDA kernels. Triton kernels are used for the xLSTM Large 7B model.
Resources: Training a 7B parameter model requires significant computational resources.
Demos: A demo.ipynb notebook is available for the xLSTM Large architecture.

Highlighted Details

Features a 7B parameter xLSTM Language Model trained on 2.3T tokens.
Provides standalone implementations for both the original NeurIPS paper architecture and the optimized xLSTM Large 7B model.
Includes experimental setups demonstrating the benefits of mLSTM and sLSTM components on tasks like Parity and Multi-Query Associative Recall.
Optimized kernels (Triton) are available for enhanced performance on NVIDIA GPUs.

Maintenance & Community

The project is associated with Sepp Hochreiter, a key figure in LSTM development. Links to Hugging Face model weights and arXiv papers are provided. No explicit community channels (Discord, Slack) are mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The presence of CUDA and Triton kernels suggests a focus on NVIDIA hardware, though native PyTorch implementations are recommended for other platforms.

Limitations & Caveats

The README focuses on NVIDIA GPU optimization; compatibility with other hardware (AMD, Apple Metal) relies on native PyTorch implementations, which may be less performant. The experimental training loops lack early stopping or test evaluation.

Health Check

Last Commit

2 months ago

Responsiveness

1+ week

Pull Requests (30d)

Issues (30d)

Star History

31 stars in the last 30 days