TransformerCompression by microsoft

Transformer compression via SliceGPT (ICLR'24)

Created 2 years ago

454 stars

Top 66.4% on SourcePulse

Project Summary

This repository provides SliceGPT, a post-training sparsification technique for transformer models, including Large Language Models (LLMs). It enables users to reduce model size and improve inference speed by applying orthogonal transformations and slicing weight matrices, without altering the model's architecture. The primary audience includes researchers and practitioners seeking to optimize transformer deployments.

How It Works

SliceGPT applies orthogonal transformations to each transformer layer, followed by slicing off the least significant rows and columns of weight matrices based on eigenvalue decay. This process replaces dense weight matrices with smaller ones, reducing embedding dimensions and thus memory footprint and latency. The method is designed to maintain model performance while achieving significant compression.

Quick Start & Requirements

Install: pip install -e .[experiment]
Prerequisites: CUDA-enabled GPU, Python. Hugging Face authentication may be required for certain models.
Documentation: Hugging Face discussion

Highlighted Details

Reduces model size and memory footprint.
Achieves inference speedups without code optimization.
Supports recovery fine-tuning (RFT) for performance restoration.
Integrates with LM Eval Harness for evaluation.

Maintenance & Community

Developed by Microsoft.
Contributions are welcomed, subject to a Contributor License Agreement (CLA).
Follows the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

The repository does not explicitly state a license in the provided README.

Limitations & Caveats

Adding support for new models requires implementing custom model and layer adapters.
Hugging Face authentication might be necessary for specific models.

Health Check

Last Commit

1 year ago

Responsiveness

1 week

Pull Requests (30d)

0

Issues (30d)

0

Star History

4 stars in the last 30 days

Explore Similar Projects

Starred by

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI) and

Zhiqiang Xie

Zhiqiang Xie(Coauthor of SGLang).

LaCT by a1600012888

Test-Time Training framework for adaptable models

Created 7 months ago

Updated 6 days ago

Starred by

Yineng Zhang

Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

MoE model for research

Created 8 months ago

Updated 4 months ago

EET by NetEase-FuXi

PyTorch plugin for efficient Transformer-based model inference

Created 4 years ago

Updated 1 year ago

Starred by

Alberto Taiuti

Alberto Taiuti(Cofounder of Luma AI),

Julien Chaumond

Julien Chaumond(Cofounder of Hugging Face), and

3 more.

exporters by huggingface

Tool to export Hugging Face models to Core ML

Created 3 years ago

Updated 1 year ago

Starred by

Luca Soldaini

Luca Soldaini(Research Scientist at Ai2),

Edward Sun

Edward Sun(Research Scientist at Meta Superintelligence Lab), and

4 more.

parallelformers by tunib-ai

Toolkit for easy model parallelization

Created 4 years ago

Updated 2 years ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera) and

Luca Soldaini

Luca Soldaini(Research Scientist at Ai2).

OLMo-core by allenai

PyTorch building blocks for large language model training and inference

Created 1 year ago

Updated 1 day ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

6 more.

xTuring by stochasticai

SDK for fine-tuning and customizing open-source LLMs

Created 2 years ago

Updated 1 week ago

Starred by

Alex Chen

Alex Chen(Cofounder of Nexa AI),

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera), and

1 more.

Olive by microsoft

AI model optimization toolkit for ONNX Runtime

Created 6 years ago

Updated 2 days ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Kaichao You

Kaichao You(Core Maintainer of vLLM), and

15 more.

AutoGPTQ by AutoGPTQ

LLM quantization package using GPTQ algorithm

Created 2 years ago

Updated 9 months ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm), and

19 more.

peft by huggingface

Parameter-efficient fine-tuning (PEFT) library

Created 3 years ago

Updated 2 days ago

Starred by

Jiaming Song

Jiaming Song(Chief Scientist at Luma AI),

Amit Jain

Amit Jain(Cofounder of Luma AI), and

22 more.

Megatron-LM by NVIDIA

Framework for training transformer models at scale

Created 6 years ago

Updated 21 hours ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and

41 more.

unsloth by unslothai

Finetuning tool for LLMs, targeting speed and memory efficiency

Created 2 years ago

Updated 1 day ago

Feedback? Help us improve.