native-sparse-attention-pytorch  by lucidrains

Sparse attention implementation from Deepseek's research paper

Created 10 months ago
791 stars

Top 44.5% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a PyTorch implementation of the "Native Sparse Attention" pattern, designed to improve the efficiency of Transformer models. It targets researchers and engineers working with large language models or sequence processing tasks who need to reduce the quadratic complexity of standard self-attention.

How It Works

The implementation leverages a sparse attention mechanism that reduces computational complexity from quadratic to linear. It achieves this by strategically selecting and computing attention scores only for a subset of token pairs, rather than all possible pairs. This approach is advantageous for handling longer sequences more efficiently without significant performance degradation.

Quick Start & Requirements

  • Install: pip install native-sparse-attention-pytorch
  • Prerequisites: PyTorch. The example requires wandb.
  • Example: The README includes a usage example and instructions to run an Enwik8 language modeling experiment.

Highlighted Details

  • Implements the "Native Sparse Attention" pattern from the Deepseek team.
  • Offers configurable parameters for sparsity, including sliding_window_size, compress_block_size, and num_selected_blocks.
  • Includes an example for language modeling on Enwik8.
  • Cites relevant research papers on sparse attention and computational complexity.

Maintenance & Community

The project is maintained by lucidrains, with contributions acknowledged from Phil Tillet, Mr-Grin, and Eric Pasewark. No specific community channels (Discord, Slack) are listed.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README. This requires further investigation for commercial use or integration into closed-source projects.

Limitations & Caveats

The README does not specify the exact performance gains or benchmarks compared to standard attention or other sparse attention implementations. The absence of an explicit license is a significant caveat for adoption.

Health Check
Last Commit

4 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 30 days

Explore Similar Projects

Starred by Jiayi Pan Jiayi Pan(Author of SWE-Gym; MTS at xAI), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

DeepSeek-V3.2-Exp by deepseek-ai

1.0%
1k
Experimental LLM boosting long-context efficiency
Created 3 months ago
Updated 1 month ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
14 more.

flashinfer by flashinfer-ai

3.5%
5k
Kernel library for LLM serving
Created 2 years ago
Updated 10 hours ago
Feedback? Help us improve.