PyTorch code for a CVPR 2025 research paper
Top 38.3% on sourcepulse
This repository provides the official PyTorch implementation of DynamicTanh (DyT), a novel element-wise operation designed to replace normalization layers in Transformer architectures. It targets researchers and practitioners in deep learning, offering a way to achieve comparable or improved performance with potentially simplified models.
How It Works
DyT replaces standard normalization layers with a learnable scaled tanh function: DyT(x) = tanh(αx)
, where α
is a learnable scalar. This approach aims to provide the stabilizing benefits of normalization without the computational overhead or architectural complexity, potentially leading to more efficient and effective models.
Quick Start & Requirements
conda create -n DyT python=3.12
, conda activate DyT
, conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia
, pip install timm==1.0.15 tensorboard
.Highlighted Details
timm
library and ConvNeXt repository.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The provided training commands are for reproducing results on ImageNet-1K; adapting DyT to other tasks or custom models requires following instructions in respective folders or the "HowTo" guide. Computational efficiency results require separate reproduction steps.
4 months ago
1 week