PyTorch plugin for efficient Transformer-based model inference
Top 98.0% on sourcepulse
EET (Easy and Efficient Transformer) is a PyTorch inference plugin designed to optimize the performance and affordability of large Transformer-based NLP and multi-modal models. It targets researchers and developers working with models like GPT-3, BERT, CLIP, Baichuan, and LLaMA, offering significant speedups and reduced memory footprints for single-GPU inference.
How It Works
EET achieves its performance gains through a combination of CUDA kernel optimizations and quantization/sparsity algorithms. It provides low-level "Operators APIs" that can be composed to build custom model architectures, as well as higher-level "Model APIs" that seamlessly integrate with Hugging Face Transformers and Fairseq models. This layered approach allows for both deep customization and easy adoption.
Quick Start & Requirements
docker build -t eet_docker:0.1 .
then nvidia-docker run ...
). Alternatively, clone the repo and pip install .
from source.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
8 months ago
1 day