flash-linear-attention by fla-org

Efficient Torch/Triton implementations for linear attention models

Created 2 years ago

4,214 stars

Top 11.6% on SourcePulse

View on GitHub

13 Experts Love This Project

Yaowei Zheng

Author of LLaMA-Factory

Zhiqiang Xie

Coauthor of SGLang

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Yang Song

Professor at Caltech; Research Scientist at OpenAI

and 9 more!

Project Summary

This repository provides efficient Triton-based implementations of state-of-the-art linear attention models for researchers and developers working with large language models. It offers optimized kernels and model integrations for various linear attention architectures, aiming to improve training and inference speed.

How It Works

The project leverages Python, PyTorch, and Triton to implement hardware-efficient kernels for linear attention mechanisms. It focuses on optimizing computations like fused operations (e.g., norm layers with gating, linear layers with cross-entropy loss) and parallelization strategies to reduce memory usage and increase throughput. The use of Triton allows for fine-grained control over GPU execution, enabling significant performance gains over standard PyTorch implementations.

Quick Start & Requirements

Install: pip install --no-use-pep517 flash-linear-attention or install from source for the latest features.
Prerequisites: PyTorch >= 2.5, Triton >= 3.0 (or nightly), einops, datasets, causal-conv1d.
Resources: Requires a CUDA-enabled GPU.
Docs: https://github.com/fla-org/flash-linear-attention

Highlighted Details

Implements a wide range of linear attention models including RetNet, GLA, Mamba, HGRN, RWKV, and more.
Offers fused modules for improved training efficiency, such as fused cross-entropy and linear layers.
Provides compatibility with the Hugging Face Transformers library for easy integration and generation.
Includes benchmarking scripts and evaluation examples using lm-evaluation-harness.

Maintenance & Community

Active development with frequent updates and new model implementations.
Community support via Discord: https://discord.gg/vDaJTmKNcS
Sponsors include Intel Corporation and Bitdeer.

Licensing & Compatibility

The repository does not explicitly state a license in the README. This requires further investigation for commercial use or closed-source linking.

Limitations & Caveats

The project is under active development, with features and APIs subject to change.
Specific model implementations might have varying levels of maturity or require specific configurations.
The lack of an explicit license in the README is a significant caveat for adoption.

Health Check

Last Commit

1 day ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

178 stars in the last 30 days