Recurrent neural network architecture based on the original LSTM
Top 23.1% on sourcepulse
xLSTM introduces a novel Recurrent Neural Network architecture designed to overcome the limitations of traditional LSTMs, offering competitive performance against Transformers and State Space Models, particularly in language modeling. It targets researchers and developers seeking efficient and powerful sequence modeling capabilities.
How It Works
xLSTM combines an "Exponential Gating" mechanism with a "Matrix Memory" and stabilization techniques. This approach aims to enhance long-range dependency modeling and computational efficiency. The architecture allows for flexible integration of its core components (mLSTM and sLSTM) in various configurations, enabling tailored performance for different tasks.
Quick Start & Requirements
pip install xlstm
(or pip install mlstm_kernels
followed by pip install xlstm
for the 7B model). A conda environment file (environment_pt240cu124.yaml
) is provided for a tested setup.demo.ipynb
notebook is available for the xLSTM Large architecture.Highlighted Details
Maintenance & Community
The project is associated with Sepp Hochreiter, a key figure in LSTM development. Links to Hugging Face model weights and arXiv papers are provided. No explicit community channels (Discord, Slack) are mentioned in the README.
Licensing & Compatibility
The repository does not explicitly state a license. The presence of CUDA and Triton kernels suggests a focus on NVIDIA hardware, though native PyTorch implementations are recommended for other platforms.
Limitations & Caveats
The README focuses on NVIDIA GPU optimization; compatibility with other hardware (AMD, Apple Metal) relies on native PyTorch implementations, which may be less performant. The experimental training loops lack early stopping or test evaluation.
2 months ago
Inactive