Flowformer by thuml

Research paper for linearizing Transformers

Created 3 years ago

329 stars

Top 83.2% on SourcePulse

Project Summary

Flowformer addresses the quadratic complexity bottleneck of Transformer attention mechanisms by introducing a novel "Flow-Attention" design. This approach linearizes attention complexity with respect to sequence length, enabling Transformers to handle significantly longer sequences and scale to larger models. It is designed for researchers and practitioners working with long sequences, computer vision, natural language processing, time series analysis, and reinforcement learning.

How It Works

Flowformer frames the attention mechanism as a flow network, where information propagates from "sources" (values) to "sinks" (results) governed by learned "flow capacities" (attention weights). By enforcing conservation principles at both source and sink, the design introduces competition among attention allocations, preventing trivial attention patterns and improving efficiency. This theoretical grounding in flow networks allows for a task-agnostic design.

Quick Start & Requirements

Install: Refer to individual task folders (e.g., Flowformer_LRA, Flowformer_CV) for specific setup and execution instructions.
Prerequisites: PyTorch. Specific tasks may require additional libraries and datasets. CUDA speed-up version is noted as a future item.
Resources: Environment configuration can be challenging; users are encouraged to seek community support.

Highlighted Details

Achieves linear complexity ($O(N)$) with respect to sequence length, outperforming vanilla Transformers which suffer from Out-Of-Memory (OOM) errors on long sequences.
Demonstrates strong performance across diverse domains: Long Sequence Modeling (LRA), Vision Recognition (ImageNet-1K), Language Modeling (WikiText-103), Time Series Classification (UEA), and Offline Reinforcement Learning (D4RL).
Introduces Mobile-Attention, an ICML 2024 publication, for mobile-device optimization.

Maintenance & Community

The project is associated with ICML 2022 and ICML 2024 publications.
Contact email provided for questions: wuhx23@mails.tsinghua.edu.cn.

Licensing & Compatibility

The README does not explicitly state a license. Code release is for research purposes.

Limitations & Caveats

A CUDA speed-up version is listed as a planned feature but not yet implemented.
Users may encounter difficulties with environment setup across different tasks.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory) and

Wing Lian

Wing Lian(Founder of Axolotl AI).

long-llms-learning by Strivin0311

Literature repository for long-context LLM methodologies

Created 2 years ago

Updated 1 year ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

native-sparse-attention-triton by XunhaoLai

Efficient sparse attention for LLMs

Created 10 months ago

Updated 7 months ago

lmms-engine by EvolvingLMMs-Lab

Unified engine for training large-scale multimodal AI models

Created 5 months ago

Updated 2 days ago

Starred by

Wing Lian

Wing Lian(Founder of Axolotl AI).

Kimi-Linear by MoonshotAI

Efficient linear attention architecture accelerates long-context LLMs

Created 2 months ago

Updated 1 month ago

Starred by

Travis Addair

Travis Addair(Cofounder of Predibase),

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI), and

5 more.

safari by HazyResearch

Research paper implementations for sequence modeling with convolutions

Created 2 years ago

Updated 1 year ago

Starred by

Phil Wang

Phil Wang(Prolific Research Paper Implementer).

awesome-fast-attention by Separius

List of efficient attention modules

Created 5 years ago

Updated 4 years ago

Starred by

Aravind Srinivas

Aravind Srinivas(Cofounder of Perplexity),

Chenlin Meng

Chenlin Meng(Cofounder of Pika), and

5 more.

sparse_attention by openai

Sparse attention primitives for research

Created 6 years ago

Updated 5 years ago

BERT-keras by Separius

Keras implementation for BERT and Transformer LM research

Created 7 years ago

Updated 6 years ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Chris Van Pelt

Chris Van Pelt(Cofounder of Weights & Biases), and

3 more.

torchscale by microsoft

PyTorch library for scaling Transformers

Created 3 years ago

Updated 1 year ago

Starred by

Chenlin Meng

Chenlin Meng(Cofounder of Pika),

Haotian Liu

Haotian Liu(Author of LLaVA; Research Scientist at xAI), and

5 more.

reformer-pytorch by lucidrains

PyTorch implementation of the Reformer, an efficient Transformer research paper

Created 6 years ago

Updated 2 years ago

Starred by

Alex Yu

Alex Yu(Research Scientist at OpenAI; Cofounder of Luma AI),

Travis Fischer

Travis Fischer(Founder of Agentic), and

5 more.

MiniMax-01 by MiniMax-AI

Large language & vision-language models based on linear attention

Created 1 year ago

Updated 6 months ago

Starred by

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory),

Zhiqiang Xie

Zhiqiang Xie(Coauthor of SGLang), and

11 more.

flash-linear-attention by fla-org

Efficient Torch/Triton implementations for linear attention models

Created 2 years ago

Updated 1 day ago

Feedback? Help us improve.