SwissArmyTransformer  by THUDM

Transformer library for flexible model development

Created 4 years ago
1,090 stars

Top 34.9% on SourcePulse

GitHubView on GitHub
Project Summary

SwissArmyTransformer (SAT) is a flexible library for developing and training custom Transformer variants, targeting researchers and engineers working with large language models. It simplifies the creation of novel architectures by allowing users to compose existing models with lightweight "mixins" for features like prefix-tuning or custom embeddings, while leveraging DeepSpeed and model parallelism for efficient large-scale training.

How It Works

SAT employs a mixin-based architecture, enabling modular extension of base Transformer models (BERT, GPT, T5, GLM, etc.). Users define new functionalities as mixins, which are then added to a model instance via add_mixin. This approach allows for code reuse and rapid prototyping of model variations, such as integrating prefix-tuning or custom positional embeddings with minimal code changes. It also supports efficient inference for autoregressive models through state caching mixins.

Quick Start & Requirements

  • Install: pip install SwissArmyTransformer
  • Requirements: Python, PyTorch, DeepSpeed. GPU and CUDA recommended for training/inference.
  • Resources: Training large models (10B+ parameters) requires significant GPU memory and distributed training setup.
  • Docs: Tutorials available for using pretrained models and custom training.

Highlighted Details

  • Supports modular extension of various Transformer architectures (BERT, GPT, T5, GLM, ViT) via mixins.
  • Integrates DeepSpeed ZeRO-2 and activation checkpointing for efficient large-scale training.
  • Enables easy addition of techniques like prefix-tuning, custom positional embeddings, and autoregressive caching.
  • Claims to be the only open-source codebase supporting finetuning T5-10B on GPUs.

Maintenance & Community

  • Developed by THUDM.
  • Based on DeepSpeed, Megatron-LM, and Huggingface transformers.
  • Contribution tutorials are planned.

Licensing & Compatibility

  • License: MIT.
  • Compatibility: Permissive license allows for commercial use and integration with closed-source projects.

Limitations & Caveats

The library is primarily focused on Transformer architectures and may require significant adaptation for non-Transformer models. While it supports large models, effective utilization necessitates a strong understanding of distributed training frameworks like DeepSpeed.

Health Check
Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
5 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Lewis Tunstall Lewis Tunstall(Research Engineer at Hugging Face), and
4 more.

fastformers by microsoft

0%
707
NLU optimization recipes for transformer models
Created 5 years ago
Updated 6 months ago
Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

matmulfreellm by ridgerchu

0.0%
3k
MatMul-free language models
Created 1 year ago
Updated 1 month ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

xTuring by stochasticai

0.0%
3k
SDK for fine-tuning and customizing open-source LLMs
Created 2 years ago
Updated 1 day ago
Feedback? Help us improve.