Discover and explore top open-source AI tools and projects—updated daily.
NetEase-FuXiPyTorch plugin for efficient Transformer-based model inference
Top 96.7% on SourcePulse
EET (Easy and Efficient Transformer) is a PyTorch inference plugin designed to optimize the performance and affordability of large Transformer-based NLP and multi-modal models. It targets researchers and developers working with models like GPT-3, BERT, CLIP, Baichuan, and LLaMA, offering significant speedups and reduced memory footprints for single-GPU inference.
How It Works
EET achieves its performance gains through a combination of CUDA kernel optimizations and quantization/sparsity algorithms. It provides low-level "Operators APIs" that can be composed to build custom model architectures, as well as higher-level "Model APIs" that seamlessly integrate with Hugging Face Transformers and Fairseq models. This layered approach allows for both deep customization and easy adoption.
Quick Start & Requirements
docker build -t eet_docker:0.1 . then nvidia-docker run ...). Alternatively, clone the repo and pip install . from source.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
11 months ago
1 day
huggingface
tunib-ai
MDK8888
THUDM
ridgerchu
bytedance
google
NVIDIA