EET by NetEase-FuXi

PyTorch plugin for efficient Transformer-based model inference

Created 4 years ago

265 stars

Top 96.5% on SourcePulse

Project Summary

EET (Easy and Efficient Transformer) is a PyTorch inference plugin designed to optimize the performance and affordability of large Transformer-based NLP and multi-modal models. It targets researchers and developers working with models like GPT-3, BERT, CLIP, Baichuan, and LLaMA, offering significant speedups and reduced memory footprints for single-GPU inference.

How It Works

EET achieves its performance gains through a combination of CUDA kernel optimizations and quantization/sparsity algorithms. It provides low-level "Operators APIs" that can be composed to build custom model architectures, as well as higher-level "Model APIs" that seamlessly integrate with Hugging Face Transformers and Fairseq models. This layered approach allows for both deep customization and easy adoption.

Quick Start & Requirements

Installation: Recommended via Docker (docker build -t eet_docker:0.1 . then nvidia-docker run ...). Alternatively, clone the repo and pip install . from source.
Prerequisites: CUDA >= 11.4, Python >= 3.7, GCC >= 7.4.0, PyTorch >= 1.12.0, NumPy >= 1.19.1, Fairseq == 0.10.0, Transformers >= 4.31.0.
Resources: Supports mega-size models on a single GPU.
Docs: https://github.com/NetEase-FuXi/EET/tree/main/example/python

Highlighted Details

Supports int8 quantization for further optimization.
Offers out-of-the-box integration with Transformers and Fairseq.
Provides ready-made pipelines for common NLP tasks (text-classification, fill-mask, text-generation, etc.).
Claims 2-8x speedup for models like GPT-3 and T5.

Maintenance & Community

Contact via GitHub issues or email (zhaosida@corp.netease.com, zhuangzhong@corp.netease.com, hzzhaozeng@corp.corp.netease.com).
A video presentation is available: https://event.baai.ac.cn/activities/325.

Licensing & Compatibility

The repository does not explicitly state a license in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as "beta" in some sections.
Only pre-padding is supported for GPT-3.
The license is not clearly defined, which may impact commercial adoption.

Health Check

Last Commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

Starred by

Meng Zhang

Meng Zhang(Cofounder of TabbyML) and

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

cformers by NolanoOrg

CPU inference for transformer models via a C++ backend

Created 2 years ago

Updated 2 years ago

transfomers-silicon-research by aliemo

Hardware research and materials for Transformer model implementations

Created 3 years ago

Updated 10 months ago

Starred by

Alberto Taiuti

Alberto Taiuti(Cofounder of Luma AI),

Julien Chaumond

Julien Chaumond(Cofounder of Hugging Face), and

3 more.

exporters by huggingface

Tool to export Hugging Face models to Core ML

Created 3 years ago

Updated 1 year ago

Starred by

Luca Soldaini

Luca Soldaini(Research Scientist at Ai2),

Edward Sun

Edward Sun(Research Scientist at Meta Superintelligence Lab), and

4 more.

parallelformers by tunib-ai

Toolkit for easy model parallelization

Created 4 years ago

Updated 2 years ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Maxime Labonne

Maxime Labonne(Head of Post-Training at Liquid AI), and

1 more.

GPTFast by MDK8888

HF Transformers accelerator for faster inference

Created 1 year ago

Updated 1 year ago

Starred by

Michael Han

Michael Han(Cofounder of Unsloth) and

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

pruna by PrunaAI

Model optimization framework for faster, smaller, cheaper, greener AI

Created 10 months ago

Updated 2 days ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai) and

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

SwissArmyTransformer by THUDM

Transformer library for flexible model development

Created 4 years ago

Updated 1 year ago

Starred by

Ying Sheng

Ying Sheng(Coauthor of SGLang).

fastertransformer_backend by triton-inference-server

Triton backend for optimized transformer inference

Created 4 years ago

Updated 2 years ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and

5 more.

matmulfreellm by ridgerchu

MatMul-free language models

Created 1 year ago

Updated 1 month ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Edward Sun

Edward Sun(Research Scientist at Meta Superintelligence Lab), and

1 more.

lightseq by bytedance

CUDA library for sequence processing/generation, optimized for Transformer-family models

Created 6 years ago

Updated 2 years ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

4 more.

gemma_pytorch by google

PyTorch implementation for Google's Gemma models

Created 1 year ago

Updated 7 months ago

Starred by

Nat Friedman

Nat Friedman(Former CEO of GitHub),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

15 more.

FasterTransformer by NVIDIA

Optimized transformer library for inference

Created 4 years ago

Updated 1 year ago

Feedback? Help us improve.