DyT by jiachenzhu

PyTorch code for a CVPR 2025 research paper

Created 10 months ago

1,032 stars

Top 36.3% on SourcePulse

Project Summary

This repository provides the official PyTorch implementation of DynamicTanh (DyT), a novel element-wise operation designed to replace normalization layers in Transformer architectures. It targets researchers and practitioners in deep learning, offering a way to achieve comparable or improved performance with potentially simplified models.

How It Works

DyT replaces standard normalization layers with a learnable scaled tanh function: DyT(x) = tanh(αx), where α is a learnable scalar. This approach aims to provide the stabilizing benefits of normalization without the computational overhead or architectural complexity, potentially leading to more efficient and effective models.

Quick Start & Requirements

Install: conda create -n DyT python=3.12, conda activate DyT, conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia, pip install timm==1.0.15 tensorboard.
Prerequisites: Python 3.12, PyTorch 2.5.1 with CUDA 12.4, timm 1.0.15.
Training: Requires ImageNet-1K dataset. Example commands provided for ViT-B/L and ConvNeXt-B/L.
Docs: arXiv, Project Page

Highlighted Details

Achieves comparable or better accuracy than normalized counterparts on ImageNet-1K for ViT and ConvNeXt models.
DyT-B ViT-B: 82.5% vs 82.3% (LN)
DyT-L ViT-L: 83.6% vs 83.1% (LN)
DyT-B ConvNeXt-B: 83.7% vs 83.7% (LN)
DyT-L ConvNeXt-L: 84.4% vs 84.3% (LN)
Codebase built upon the timm library and ConvNeXt repository.

Maintenance & Community

Authors include researchers from FAIR, NYU, MIT, and Princeton.
The project is associated with the CVPR 2025 paper "Transformers without Normalization".

Licensing & Compatibility

Released under the MIT license, permitting commercial use and closed-source linking.

Limitations & Caveats

The provided training commands are for reproducing results on ImageNet-1K; adapting DyT to other tasks or custom models requires following instructions in respective folders or the "HowTo" guide. Computational efficiency results require separate reproduction steps.

Health Check

Last Commit

9 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days