DyT  by jiachenzhu

PyTorch code for a CVPR 2025 research paper

created 4 months ago
987 stars

Top 38.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation of DynamicTanh (DyT), a novel element-wise operation designed to replace normalization layers in Transformer architectures. It targets researchers and practitioners in deep learning, offering a way to achieve comparable or improved performance with potentially simplified models.

How It Works

DyT replaces standard normalization layers with a learnable scaled tanh function: DyT(x) = tanh(αx), where α is a learnable scalar. This approach aims to provide the stabilizing benefits of normalization without the computational overhead or architectural complexity, potentially leading to more efficient and effective models.

Quick Start & Requirements

  • Install: conda create -n DyT python=3.12, conda activate DyT, conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia, pip install timm==1.0.15 tensorboard.
  • Prerequisites: Python 3.12, PyTorch 2.5.1 with CUDA 12.4, timm 1.0.15.
  • Training: Requires ImageNet-1K dataset. Example commands provided for ViT-B/L and ConvNeXt-B/L.
  • Docs: arXiv, Project Page

Highlighted Details

  • Achieves comparable or better accuracy than normalized counterparts on ImageNet-1K for ViT and ConvNeXt models.
  • DyT-B ViT-B: 82.5% vs 82.3% (LN)
  • DyT-L ViT-L: 83.6% vs 83.1% (LN)
  • DyT-B ConvNeXt-B: 83.7% vs 83.7% (LN)
  • DyT-L ConvNeXt-L: 84.4% vs 84.3% (LN)
  • Codebase built upon the timm library and ConvNeXt repository.

Maintenance & Community

  • Authors include researchers from FAIR, NYU, MIT, and Princeton.
  • The project is associated with the CVPR 2025 paper "Transformers without Normalization".

Licensing & Compatibility

  • Released under the MIT license, permitting commercial use and closed-source linking.

Limitations & Caveats

The provided training commands are for reproducing results on ImageNet-1K; adapting DyT to other tasks or custom models requires following instructions in respective folders or the "HowTo" guide. Computational efficiency results require separate reproduction steps.

Health Check
Last commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
77 stars in the last 90 days

Explore Similar Projects

Starred by Lilian Weng Lilian Weng(Cofounder of Thinking Machines Lab), Patrick Kidger Patrick Kidger(Core Contributor to JAX ecosystem), and
4 more.

glow by openai

0.1%
3k
Generative flow research paper code
created 7 years ago
updated 1 year ago
Feedback? Help us improve.