DyT  by jiachenzhu

PyTorch code for a CVPR 2025 research paper

Created 6 months ago
1,013 stars

Top 36.9% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides the official PyTorch implementation of DynamicTanh (DyT), a novel element-wise operation designed to replace normalization layers in Transformer architectures. It targets researchers and practitioners in deep learning, offering a way to achieve comparable or improved performance with potentially simplified models.

How It Works

DyT replaces standard normalization layers with a learnable scaled tanh function: DyT(x) = tanh(αx), where α is a learnable scalar. This approach aims to provide the stabilizing benefits of normalization without the computational overhead or architectural complexity, potentially leading to more efficient and effective models.

Quick Start & Requirements

  • Install: conda create -n DyT python=3.12, conda activate DyT, conda install pytorch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia, pip install timm==1.0.15 tensorboard.
  • Prerequisites: Python 3.12, PyTorch 2.5.1 with CUDA 12.4, timm 1.0.15.
  • Training: Requires ImageNet-1K dataset. Example commands provided for ViT-B/L and ConvNeXt-B/L.
  • Docs: arXiv, Project Page

Highlighted Details

  • Achieves comparable or better accuracy than normalized counterparts on ImageNet-1K for ViT and ConvNeXt models.
  • DyT-B ViT-B: 82.5% vs 82.3% (LN)
  • DyT-L ViT-L: 83.6% vs 83.1% (LN)
  • DyT-B ConvNeXt-B: 83.7% vs 83.7% (LN)
  • DyT-L ConvNeXt-L: 84.4% vs 84.3% (LN)
  • Codebase built upon the timm library and ConvNeXt repository.

Maintenance & Community

  • Authors include researchers from FAIR, NYU, MIT, and Princeton.
  • The project is associated with the CVPR 2025 paper "Transformers without Normalization".

Licensing & Compatibility

  • Released under the MIT license, permitting commercial use and closed-source linking.

Limitations & Caveats

The provided training commands are for reproducing results on ImageNet-1K; adapting DyT to other tasks or custom models requires following instructions in respective folders or the "HowTo" guide. Computational efficiency results require separate reproduction steps.

Health Check
Last Commit

5 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
12 stars in the last 30 days

Explore Similar Projects

Starred by Shizhe Diao Shizhe Diao(Author of LMFlow; Research Scientist at NVIDIA), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
9 more.

DiT by facebookresearch

0.3%
8k
PyTorch implementation for diffusion models with transformers (DiT)
Created 2 years ago
Updated 1 year ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Soumith Chintala Soumith Chintala(Coauthor of PyTorch), and
1 more.

jetson-inference by dusty-nv

0.1%
9k
Vision DNN library for NVIDIA Jetson devices
Created 9 years ago
Updated 11 months ago
Feedback? Help us improve.