SwissArmyTransformer by THUDM

Transformer library for flexible model development

Created 4 years ago

1,110 stars

Top 34.4% on SourcePulse

View on GitHub

2 Experts Love This Project

Jeremy Howard

Cofounder of fast.ai

Stas Bekman

Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake

Project Summary

SwissArmyTransformer (SAT) is a flexible library for developing and training custom Transformer variants, targeting researchers and engineers working with large language models. It simplifies the creation of novel architectures by allowing users to compose existing models with lightweight "mixins" for features like prefix-tuning or custom embeddings, while leveraging DeepSpeed and model parallelism for efficient large-scale training.

How It Works

SAT employs a mixin-based architecture, enabling modular extension of base Transformer models (BERT, GPT, T5, GLM, etc.). Users define new functionalities as mixins, which are then added to a model instance via add_mixin. This approach allows for code reuse and rapid prototyping of model variations, such as integrating prefix-tuning or custom positional embeddings with minimal code changes. It also supports efficient inference for autoregressive models through state caching mixins.

Quick Start & Requirements

Install: pip install SwissArmyTransformer
Requirements: Python, PyTorch, DeepSpeed. GPU and CUDA recommended for training/inference.
Resources: Training large models (10B+ parameters) requires significant GPU memory and distributed training setup.
Docs: Tutorials available for using pretrained models and custom training.

Highlighted Details

Supports modular extension of various Transformer architectures (BERT, GPT, T5, GLM, ViT) via mixins.
Integrates DeepSpeed ZeRO-2 and activation checkpointing for efficient large-scale training.
Enables easy addition of techniques like prefix-tuning, custom positional embeddings, and autoregressive caching.
Claims to be the only open-source codebase supporting finetuning T5-10B on GPUs.

Maintenance & Community

Developed by THUDM.
Based on DeepSpeed, Megatron-LM, and Huggingface transformers.
Contribution tutorials are planned.

Licensing & Compatibility

License: MIT.
Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The library is primarily focused on Transformer architectures and may require significant adaptation for non-Transformer models. While it supports large models, effective utilization necessitates a strong understanding of distributed training frameworks like DeepSpeed.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

9 stars in the last 30 days