flash-linear-attention  by fla-org

Efficient Torch/Triton implementations for linear attention models

created 1 year ago
2,977 stars

Top 16.5% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides efficient Triton-based implementations of state-of-the-art linear attention models for researchers and developers working with large language models. It offers optimized kernels and model integrations for various linear attention architectures, aiming to improve training and inference speed.

How It Works

The project leverages Python, PyTorch, and Triton to implement hardware-efficient kernels for linear attention mechanisms. It focuses on optimizing computations like fused operations (e.g., norm layers with gating, linear layers with cross-entropy loss) and parallelization strategies to reduce memory usage and increase throughput. The use of Triton allows for fine-grained control over GPU execution, enabling significant performance gains over standard PyTorch implementations.

Quick Start & Requirements

  • Install: pip install --no-use-pep517 flash-linear-attention or install from source for the latest features.
  • Prerequisites: PyTorch >= 2.5, Triton >= 3.0 (or nightly), einops, datasets, causal-conv1d.
  • Resources: Requires a CUDA-enabled GPU.
  • Docs: https://github.com/fla-org/flash-linear-attention

Highlighted Details

  • Implements a wide range of linear attention models including RetNet, GLA, Mamba, HGRN, RWKV, and more.
  • Offers fused modules for improved training efficiency, such as fused cross-entropy and linear layers.
  • Provides compatibility with the Hugging Face Transformers library for easy integration and generation.
  • Includes benchmarking scripts and evaluation examples using lm-evaluation-harness.

Maintenance & Community

  • Active development with frequent updates and new model implementations.
  • Community support via Discord: https://discord.gg/vDaJTmKNcS
  • Sponsors include Intel Corporation and Bitdeer.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

  • The project is under active development, with features and APIs subject to change.
  • Specific model implementations might have varying levels of maturity or require specific configurations.
  • The lack of an explicit license in the README is a significant caveat for adoption.
Health Check
Last commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)
34
Issues (30d)
19
Star History
670 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.