SwiftFormer by Amshaker

Vision transformer research paper focusing on efficient mobile applications

Created 2 years ago

308 stars

Top 87.3% on SourcePulse

Project Summary

SwiftFormer introduces an efficient additive attention mechanism designed to overcome the quadratic complexity of standard self-attention, enabling real-time performance on mobile devices. Targeting computer vision applications, it offers a compelling speed-accuracy trade-off for tasks like image classification, detection, and segmentation on resource-constrained platforms.

How It Works

The core innovation is an additive attention mechanism that replaces expensive quadratic matrix multiplications with linear element-wise operations. This is achieved by pooling query matrices to produce global queries, which are then element-wise multiplied with key matrices to derive a global context representation. This linear approach allows the attention mechanism to be integrated across all network stages without sacrificing accuracy, unlike previous hybrid CNN-Transformer models.

Quick Start & Requirements

Install: Recommended via conda environment with specific PyTorch version (1.11.0+cu113) and coremltools==5.2.0.
Prerequisites: Python 3.9, PyTorch 1.11.0 with CUDA 11.3, timm, coremltools. ImageNet dataset required for training/evaluation.
Setup: Requires downloading ImageNet and setting up a conda environment. Training scripts are provided for multi-GPU and multi-node setups.
Links: Official ICCV'23 Paper, Codebase

Highlighted Details

SwiftFormer-S achieves 78.5% ImageNet-1K accuracy with 0.8ms latency on iPhone 14 (2x faster than MobileViT-v2).
Models range from SwiftFormer-XS (3.5M params, 0.6 GMACs) to SwiftFormer-L3 (28.5M params, 4.0 GMACs).
Latency measurements provided for iPhone 14 Neural Engine and Samsung Galaxy S23 Ultra (Snapdragon 8 Gen 2).
Codebase is based on LeViT and EfficientFormer.

Maintenance & Community

Project accepted at ICCV'2023.
Contact available via GitHub issues or email.
Related works include EdgeNeXt.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The repository does not explicitly state the license, which may impact commercial adoption. Specific instructions for exporting and profiling models for different mobile platforms are referenced in issues, suggesting potential complexity for deployment.

SwiftFormer by Amshaker

Explore Similar Projects

EdgeNeXt by mmaaz60

deep-text-recognition-benchmark by roatienza

efficientdet-pytorch by bubbliiiing

EfficientPose by ybkscht

MobileVLM by Meituan-AutoML

awesome-ml-demos-with-ios by tucan9389

yolov4-tiny-pytorch by bubbliiiing

EfficientDet by xuannianz

ml-fastvlm by apple

corenet by apple

MNN by alibaba

ncnn by Tencent