Flowformer  by thuml

Research paper for linearizing Transformers

created 3 years ago
323 stars

Top 85.3% on sourcepulse

GitHubView on GitHub
Project Summary

Flowformer addresses the quadratic complexity bottleneck of Transformer attention mechanisms by introducing a novel "Flow-Attention" design. This approach linearizes attention complexity with respect to sequence length, enabling Transformers to handle significantly longer sequences and scale to larger models. It is designed for researchers and practitioners working with long sequences, computer vision, natural language processing, time series analysis, and reinforcement learning.

How It Works

Flowformer frames the attention mechanism as a flow network, where information propagates from "sources" (values) to "sinks" (results) governed by learned "flow capacities" (attention weights). By enforcing conservation principles at both source and sink, the design introduces competition among attention allocations, preventing trivial attention patterns and improving efficiency. This theoretical grounding in flow networks allows for a task-agnostic design.

Quick Start & Requirements

  • Install: Refer to individual task folders (e.g., Flowformer_LRA, Flowformer_CV) for specific setup and execution instructions.
  • Prerequisites: PyTorch. Specific tasks may require additional libraries and datasets. CUDA speed-up version is noted as a future item.
  • Resources: Environment configuration can be challenging; users are encouraged to seek community support.

Highlighted Details

  • Achieves linear complexity ($O(N)$) with respect to sequence length, outperforming vanilla Transformers which suffer from Out-Of-Memory (OOM) errors on long sequences.
  • Demonstrates strong performance across diverse domains: Long Sequence Modeling (LRA), Vision Recognition (ImageNet-1K), Language Modeling (WikiText-103), Time Series Classification (UEA), and Offline Reinforcement Learning (D4RL).
  • Introduces Mobile-Attention, an ICML 2024 publication, for mobile-device optimization.

Maintenance & Community

Licensing & Compatibility

  • The README does not explicitly state a license. Code release is for research purposes.

Limitations & Caveats

  • A CUDA speed-up version is listed as a planned feature but not yet implemented.
  • Users may encounter difficulties with environment setup across different tasks.
Health Check
Last commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.