SwiftFormer  by Amshaker

Vision transformer research paper focusing on efficient mobile applications

created 2 years ago
290 stars

Top 91.7% on sourcepulse

GitHubView on GitHub
Project Summary

SwiftFormer introduces an efficient additive attention mechanism designed to overcome the quadratic complexity of standard self-attention, enabling real-time performance on mobile devices. Targeting computer vision applications, it offers a compelling speed-accuracy trade-off for tasks like image classification, detection, and segmentation on resource-constrained platforms.

How It Works

The core innovation is an additive attention mechanism that replaces expensive quadratic matrix multiplications with linear element-wise operations. This is achieved by pooling query matrices to produce global queries, which are then element-wise multiplied with key matrices to derive a global context representation. This linear approach allows the attention mechanism to be integrated across all network stages without sacrificing accuracy, unlike previous hybrid CNN-Transformer models.

Quick Start & Requirements

  • Install: Recommended via conda environment with specific PyTorch version (1.11.0+cu113) and coremltools==5.2.0.
  • Prerequisites: Python 3.9, PyTorch 1.11.0 with CUDA 11.3, timm, coremltools. ImageNet dataset required for training/evaluation.
  • Setup: Requires downloading ImageNet and setting up a conda environment. Training scripts are provided for multi-GPU and multi-node setups.
  • Links: Official ICCV'23 Paper, Codebase

Highlighted Details

  • SwiftFormer-S achieves 78.5% ImageNet-1K accuracy with 0.8ms latency on iPhone 14 (2x faster than MobileViT-v2).
  • Models range from SwiftFormer-XS (3.5M params, 0.6 GMACs) to SwiftFormer-L3 (28.5M params, 4.0 GMACs).
  • Latency measurements provided for iPhone 14 Neural Engine and Samsung Galaxy S23 Ultra (Snapdragon 8 Gen 2).
  • Codebase is based on LeViT and EfficientFormer.

Maintenance & Community

  • Project accepted at ICCV'2023.
  • Contact available via GitHub issues or email.
  • Related works include EdgeNeXt.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The repository does not explicitly state the license, which may impact commercial adoption. Specific instructions for exporting and profiling models for different mobile platforms are referenced in issues, suggesting potential complexity for deployment.

Health Check
Last commit

2 weeks ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
12 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.