EET  by NetEase-FuXi

PyTorch plugin for efficient Transformer-based model inference

Created 4 years ago
265 stars

Top 96.6% on SourcePulse

GitHubView on GitHub
Project Summary

EET (Easy and Efficient Transformer) is a PyTorch inference plugin designed to optimize the performance and affordability of large Transformer-based NLP and multi-modal models. It targets researchers and developers working with models like GPT-3, BERT, CLIP, Baichuan, and LLaMA, offering significant speedups and reduced memory footprints for single-GPU inference.

How It Works

EET achieves its performance gains through a combination of CUDA kernel optimizations and quantization/sparsity algorithms. It provides low-level "Operators APIs" that can be composed to build custom model architectures, as well as higher-level "Model APIs" that seamlessly integrate with Hugging Face Transformers and Fairseq models. This layered approach allows for both deep customization and easy adoption.

Quick Start & Requirements

  • Installation: Recommended via Docker (docker build -t eet_docker:0.1 . then nvidia-docker run ...). Alternatively, clone the repo and pip install . from source.
  • Prerequisites: CUDA >= 11.4, Python >= 3.7, GCC >= 7.4.0, PyTorch >= 1.12.0, NumPy >= 1.19.1, Fairseq == 0.10.0, Transformers >= 4.31.0.
  • Resources: Supports mega-size models on a single GPU.
  • Docs: https://github.com/NetEase-FuXi/EET/tree/main/example/python

Highlighted Details

  • Supports int8 quantization for further optimization.
  • Offers out-of-the-box integration with Transformers and Fairseq.
  • Provides ready-made pipelines for common NLP tasks (text-classification, fill-mask, text-generation, etc.).
  • Claims 2-8x speedup for models like GPT-3 and T5.

Maintenance & Community

Licensing & Compatibility

  • The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • The project is described as "beta" in some sections.
  • Only pre-padding is supported for GPT-3.
  • The license is not clearly defined, which may impact commercial adoption.
Health Check
Last Commit

9 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
2 stars in the last 30 days

Explore Similar Projects

Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems") and Jeremy Howard Jeremy Howard(Cofounder of fast.ai).

GPTFast by MDK8888

0%
687
HF Transformers accelerator for faster inference
Created 1 year ago
Updated 1 year ago
Starred by Jeremy Howard Jeremy Howard(Cofounder of fast.ai) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

0.3%
1k
Transformer library for flexible model development
Created 4 years ago
Updated 8 months ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
5 more.

matmulfreellm by ridgerchu

0.0%
3k
MatMul-free language models
Created 1 year ago
Updated 1 month ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

gemma_pytorch by google

0.2%
6k
PyTorch implementation for Google's Gemma models
Created 1 year ago
Updated 3 months ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

FasterTransformer by NVIDIA

0.1%
6k
Optimized transformer library for inference
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.