EdgeNeXt by mmaaz60

Vision architecture research paper:

Created 3 years ago

406 stars

Top 71.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Ross Wightman

Author of timm; CV at Hugging Face

Project Summary

EdgeNeXt is an efficient hybrid neural network architecture designed to combine the strengths of Convolutional Neural Networks (CNNs) and Transformers for mobile vision applications. It targets researchers and developers building resource-constrained computer vision systems, offering improved accuracy with lower computational requirements compared to existing state-of-the-art models.

How It Works

EdgeNeXt introduces a novel "split depth-wise transpose attention" (SDTA) encoder. This mechanism splits input tensors into multiple channel groups and applies depth-wise convolution alongside self-attention across channel dimensions. This approach implicitly increases the receptive field and encodes multi-scale features efficiently, aiming to leverage the local feature extraction capabilities of CNNs and the global context modeling of Transformers without the high computational cost typically associated with pure Transformer models.

Quick Start & Requirements

Install: pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu113 followed by pip install -r requirements.txt.
Prerequisites: Python 3.8, PyTorch with CUDA 11.3 support, ImageNet-1K dataset.
Evaluation: Download weights (e.g., edgenext_small.pth) and run python main.py --model edgenext_small --eval True --batch_size 16 --data_path <path_to_imagenet> --output_dir <results> --resume edgenext_small.pth.
Links: Official Repository

Highlighted Details

EdgeNeXt-XX-Small (1.3M params) achieves 71.2% top-1 accuracy on ImageNet-1K with 261M FLOPs, outperforming MobileViT by 2.2% with 28% fewer FLOPs.
EdgeNeXt-Base (18.51M params) achieves 82.5% top-1 accuracy on ImageNet-1K.
Pretrained models are available for ImageNet-1K and ImageNet-21K.
The architecture is evaluated on classification, detection, and segmentation tasks.

Maintenance & Community

The project is the official repository for the EdgeNeXt paper presented at CADL'22 and ECCVW. Contact information for authors is provided for inquiries.

Licensing & Compatibility

The repository does not explicitly state a license in the README. The code is based on the ConvNeXt repository, which is typically released under permissive licenses like MIT. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions CUDA 11.3 specifically for PyTorch installation, suggesting potential compatibility issues with newer CUDA versions. The primary focus is on ImageNet-1K evaluation, and performance on other datasets or tasks may require further investigation.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days