xlstm  by NX-AI

Recurrent neural network architecture based on the original LSTM

created 1 year ago
1,940 stars

Top 23.1% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

xLSTM introduces a novel Recurrent Neural Network architecture designed to overcome the limitations of traditional LSTMs, offering competitive performance against Transformers and State Space Models, particularly in language modeling. It targets researchers and developers seeking efficient and powerful sequence modeling capabilities.

How It Works

xLSTM combines an "Exponential Gating" mechanism with a "Matrix Memory" and stabilization techniques. This approach aims to enhance long-range dependency modeling and computational efficiency. The architecture allows for flexible integration of its core components (mLSTM and sLSTM) in various configurations, enabling tailored performance for different tasks.

Quick Start & Requirements

  • Installation: pip install xlstm (or pip install mlstm_kernels followed by pip install xlstm for the 7B model). A conda environment file (environment_pt240cu124.yaml) is provided for a tested setup.
  • Prerequisites: PyTorch (>=1.8), CUDA (>=12.4 recommended for optimized kernels). NVIDIA GPUs with Compute Capability >= 8.0 are required for the optimized CUDA kernels. Triton kernels are used for the xLSTM Large 7B model.
  • Resources: Training a 7B parameter model requires significant computational resources.
  • Demos: A demo.ipynb notebook is available for the xLSTM Large architecture.

Highlighted Details

  • Features a 7B parameter xLSTM Language Model trained on 2.3T tokens.
  • Provides standalone implementations for both the original NeurIPS paper architecture and the optimized xLSTM Large 7B model.
  • Includes experimental setups demonstrating the benefits of mLSTM and sLSTM components on tasks like Parity and Multi-Query Associative Recall.
  • Optimized kernels (Triton) are available for enhanced performance on NVIDIA GPUs.

Maintenance & Community

The project is associated with Sepp Hochreiter, a key figure in LSTM development. Links to Hugging Face model weights and arXiv papers are provided. No explicit community channels (Discord, Slack) are mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The presence of CUDA and Triton kernels suggests a focus on NVIDIA hardware, though native PyTorch implementations are recommended for other platforms.

Limitations & Caveats

The README focuses on NVIDIA GPU optimization; compatibility with other hardware (AMD, Apple Metal) relies on native PyTorch implementations, which may be less performant. The experimental training loops lack early stopping or test evaluation.

Health Check
Last commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
1
Issues (30d)
2
Star History
111 stars in the last 90 days

Explore Similar Projects

Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; AI Researcher at UC Berkeley), and
5 more.

Liger-Kernel by linkedin

0.6%
5k
Triton kernels for efficient LLM training
created 1 year ago
updated 2 days ago
Starred by Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), Abhishek Thakur Abhishek Thakur(World's First 4x Kaggle GrandMaster), and
5 more.

xlnet by zihangdai

0.0%
6k
Language model research paper using generalized autoregressive pretraining
created 6 years ago
updated 2 years ago
Feedback? Help us improve.