Research paper for linearizing Transformers
Top 85.3% on sourcepulse
Flowformer addresses the quadratic complexity bottleneck of Transformer attention mechanisms by introducing a novel "Flow-Attention" design. This approach linearizes attention complexity with respect to sequence length, enabling Transformers to handle significantly longer sequences and scale to larger models. It is designed for researchers and practitioners working with long sequences, computer vision, natural language processing, time series analysis, and reinforcement learning.
How It Works
Flowformer frames the attention mechanism as a flow network, where information propagates from "sources" (values) to "sinks" (results) governed by learned "flow capacities" (attention weights). By enforcing conservation principles at both source and sink, the design introduces competition among attention allocations, preventing trivial attention patterns and improving efficiency. This theoretical grounding in flow networks allows for a task-agnostic design.
Quick Start & Requirements
Flowformer_LRA
, Flowformer_CV
) for specific setup and execution instructions.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
1 week