xlstm  by NX-AI

Recurrent neural network architecture based on the original LSTM

Created 1 year ago
1,982 stars

Top 22.4% on SourcePulse

GitHubView on GitHub
Project Summary

xLSTM introduces a novel Recurrent Neural Network architecture designed to overcome the limitations of traditional LSTMs, offering competitive performance against Transformers and State Space Models, particularly in language modeling. It targets researchers and developers seeking efficient and powerful sequence modeling capabilities.

How It Works

xLSTM combines an "Exponential Gating" mechanism with a "Matrix Memory" and stabilization techniques. This approach aims to enhance long-range dependency modeling and computational efficiency. The architecture allows for flexible integration of its core components (mLSTM and sLSTM) in various configurations, enabling tailored performance for different tasks.

Quick Start & Requirements

  • Installation: pip install xlstm (or pip install mlstm_kernels followed by pip install xlstm for the 7B model). A conda environment file (environment_pt240cu124.yaml) is provided for a tested setup.
  • Prerequisites: PyTorch (>=1.8), CUDA (>=12.4 recommended for optimized kernels). NVIDIA GPUs with Compute Capability >= 8.0 are required for the optimized CUDA kernels. Triton kernels are used for the xLSTM Large 7B model.
  • Resources: Training a 7B parameter model requires significant computational resources.
  • Demos: A demo.ipynb notebook is available for the xLSTM Large architecture.

Highlighted Details

  • Features a 7B parameter xLSTM Language Model trained on 2.3T tokens.
  • Provides standalone implementations for both the original NeurIPS paper architecture and the optimized xLSTM Large 7B model.
  • Includes experimental setups demonstrating the benefits of mLSTM and sLSTM components on tasks like Parity and Multi-Query Associative Recall.
  • Optimized kernels (Triton) are available for enhanced performance on NVIDIA GPUs.

Maintenance & Community

The project is associated with Sepp Hochreiter, a key figure in LSTM development. Links to Hugging Face model weights and arXiv papers are provided. No explicit community channels (Discord, Slack) are mentioned in the README.

Licensing & Compatibility

The repository does not explicitly state a license. The presence of CUDA and Triton kernels suggests a focus on NVIDIA hardware, though native PyTorch implementations are recommended for other platforms.

Limitations & Caveats

The README focuses on NVIDIA GPU optimization; compatibility with other hardware (AMD, Apple Metal) relies on native PyTorch implementations, which may be less performant. The experimental training loops lack early stopping or test evaluation.

Health Check
Last Commit

2 weeks ago

Responsiveness

1+ week

Pull Requests (30d)
2
Issues (30d)
7
Star History
24 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), and
4 more.

ml-cross-entropy by apple

0.4%
520
PyTorch module for memory-efficient cross-entropy in LLMs
Created 10 months ago
Updated 1 day ago
Starred by Ying Sheng Ying Sheng(Coauthor of SGLang) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm-analysis by cli99

0.4%
455
CLI tool for LLM latency/memory analysis during training/inference
Created 2 years ago
Updated 5 months ago
Starred by Wing Lian Wing Lian(Founder of Axolotl AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

recurrent-pretraining by seal-rg

0%
827
Pretraining code for depth-recurrent language model research
Created 7 months ago
Updated 1 week ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
11 more.

Liger-Kernel by linkedin

0.6%
6k
Triton kernels for efficient LLM training
Created 1 year ago
Updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), and
34 more.

flash-attention by Dao-AILab

0.6%
20k
Fast, memory-efficient attention implementation
Created 3 years ago
Updated 1 day ago
Feedback? Help us improve.