DyT  by jiachenzhu

PyTorch code for a CVPR 2025 research paper

Created 11 months ago
1,033 stars

Top 36.1% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation of DynamicTanh (DyT), a novel element-wise operation designed to replace normalization layers in Transformer architectures. It targets researchers and practitioners in deep learning, offering a way to achieve comparable or improved performance with potentially simplified models.

How It Works

DyT replaces standard normalization layers with a learnable scaled tanh function: DyT(x) = tanh(αx), where α is a learnable scalar. This approach aims to provide the stabilizing benefits of normalization without the computational overhead or architectural complexity, potentially leading to more efficient and effective models.

Quick Start & Requirements

  • Install: conda create -n DyT python=3.12, conda activate DyT, conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia, pip install timm==1.0.15 tensorboard.
  • Prerequisites: Python 3.12, PyTorch 2.5.1 with CUDA 12.4, timm 1.0.15.
  • Training: Requires ImageNet-1K dataset. Example commands provided for ViT-B/L and ConvNeXt-B/L.
  • Docs: arXiv, Project Page

Highlighted Details

  • Achieves comparable or better accuracy than normalized counterparts on ImageNet-1K for ViT and ConvNeXt models.
  • DyT-B ViT-B: 82.5% vs 82.3% (LN)
  • DyT-L ViT-L: 83.6% vs 83.1% (LN)
  • DyT-B ConvNeXt-B: 83.7% vs 83.7% (LN)
  • DyT-L ConvNeXt-L: 84.4% vs 84.3% (LN)
  • Codebase built upon the timm library and ConvNeXt repository.

Maintenance & Community

  • Authors include researchers from FAIR, NYU, MIT, and Princeton.
  • The project is associated with the CVPR 2025 paper "Transformers without Normalization".

Licensing & Compatibility

  • Released under the MIT license, permitting commercial use and closed-source linking.

Limitations & Caveats

The provided training commands are for reproducing results on ImageNet-1K; adapting DyT to other tasks or custom models requires following instructions in respective folders or the "HowTo" guide. Computational efficiency results require separate reproduction steps.

Health Check
Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
9 more.

DiT by facebookresearch

0.2%
8k
PyTorch implementation for diffusion models with transformers (DiT)
Created 3 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Soumith Chintala Soumith Chintala(Coauthor of PyTorch), and
1 more.

jetson-inference by dusty-nv

0.1%
9k
Vision DNN library for NVIDIA Jetson devices
Created 9 years ago
Updated 4 months ago
Feedback? Help us improve.