PyTorch library for scaling Transformers
Top 15.8% on sourcepulse
TorchScale provides a PyTorch library for building and scaling Transformer-based foundation models, targeting researchers and developers focused on AGI and large-scale AI. It offers implementations of novel architectures like RetNet, LongNet, and BitNet, aiming for improved generality, capability, stability, and efficiency across modalities.
How It Works
TorchScale implements advanced Transformer variants by parameterizing architectural choices within configuration objects. Key features include DeepNet for training stability in very deep Transformers, SubLN for improved generality and stability, and X-MoE for efficient sparse Mixture-of-Experts. It also supports multi-modal architectures and length extrapolation techniques like Xpos.
Quick Start & Requirements
pip install torchscale
flash-attn
(Ampere+ GPUs) or xformers
(Volta+ GPUs). CUDA 11.8 or 12.1 required for xformers
.EncoderConfig
, DecoderConfig
, or EncoderDecoderConfig
and instantiating the respective model classes.Highlighted Details
Maintenance & Community
The project is actively maintained by Microsoft researchers, with contributions from Shuming Ma and Hongyu Wang. It cites several related research papers and welcomes community contributions via pull requests, adhering to a Contributor License Agreement (CLA).
Licensing & Compatibility
The project is licensed under the MIT License, permitting commercial use and integration with closed-source projects.
Limitations & Caveats
While comprehensive, some advanced features like LongViT are marked as "in progress." The library is primarily focused on PyTorch and may require adaptation for other deep learning frameworks.
1 year ago
Inactive