metal-flash-attention by philipturner

Metal port of FlashAttention for Apple silicon

Created 2 years ago

572 stars

Top 56.3% on SourcePulse

View on GitHub

7 Experts Love This Project

Wing Lian

Founder of Axolotl AI

Omar Sanseviero

DevRel at Google DeepMind

and 3 more!

Project Summary

This repository provides a Metal port of FlashAttention, optimized for Apple Silicon. It targets researchers and developers working with large language models on Apple hardware, offering a performant and memory-efficient implementation of the attention mechanism.

How It Works

The port focuses on single-headed attention, meticulously optimizing for Metal's architecture. It addresses register pressure bottlenecks through novel blocking strategies and intentional register spilling, achieving high ALU utilization. The backward pass is redesigned with higher compute cost but improved parallelization, avoiding problematic FP32 atomics emulation on Apple hardware.

Quick Start & Requirements

Install via Swift Package Manager or by cloning the repository.
Requires macOS and Xcode.
Compile with -Xswiftc -Ounchecked for performance.
See Usage for detailed setup.

Highlighted Details

Achieves 4400 gigainstructions per second on M1 Max.
Backward pass uses less memory than the official implementation.
Novel backward pass design with 100% parallelization efficiency.
Optimized register spilling for large head dimensions.

Maintenance & Community

Maintained by philipturner.
No explicit community links (Discord/Slack) are provided in the README.

Licensing & Compatibility

The README does not explicitly state a license.

Limitations & Caveats

Currently supports only single-headed attention.
BF16 emulation is used, which may incur overhead on older chips.
Benchmarks suggest potential issues with large head dimensions (D=256) on some Nvidia hardware, though this port aims to address such limitations on Apple Silicon.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

8 stars in the last 30 days