TransformerCompression  by microsoft

Transformer compression via SliceGPT (ICLR'24)

Created 1 year ago
445 stars

Top 67.5% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides SliceGPT, a post-training sparsification technique for transformer models, including Large Language Models (LLMs). It enables users to reduce model size and improve inference speed by applying orthogonal transformations and slicing weight matrices, without altering the model's architecture. The primary audience includes researchers and practitioners seeking to optimize transformer deployments.

How It Works

SliceGPT applies orthogonal transformations to each transformer layer, followed by slicing off the least significant rows and columns of weight matrices based on eigenvalue decay. This process replaces dense weight matrices with smaller ones, reducing embedding dimensions and thus memory footprint and latency. The method is designed to maintain model performance while achieving significant compression.

Quick Start & Requirements

  • Install: pip install -e .[experiment]
  • Prerequisites: CUDA-enabled GPU, Python. Hugging Face authentication may be required for certain models.
  • Documentation: Hugging Face discussion

Highlighted Details

  • Reduces model size and memory footprint.
  • Achieves inference speedups without code optimization.
  • Supports recovery fine-tuning (RFT) for performance restoration.
  • Integrates with LM Eval Harness for evaluation.

Maintenance & Community

  • Developed by Microsoft.
  • Contributions are welcomed, subject to a Contributor License Agreement (CLA).
  • Follows the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

  • Adding support for new models requires implementing custom model and layer adapters.
  • Hugging Face authentication might be necessary for specific models.
Health Check
Last Commit

8 months ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI).

dots.llm1 by rednote-hilab

0.2%
462
MoE model for research
Created 4 months ago
Updated 4 weeks ago
Starred by Luca Soldaini Luca Soldaini(Research Scientist at Ai2), Edward Sun Edward Sun(Research Scientist at Meta Superintelligence Lab), and
4 more.

parallelformers by tunib-ai

0%
790
Toolkit for easy model parallelization
Created 4 years ago
Updated 2 years ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
6 more.

xTuring by stochasticai

0.0%
3k
SDK for fine-tuning and customizing open-source LLMs
Created 2 years ago
Updated 1 day ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
36 more.

unsloth by unslothai

0.6%
46k
Finetuning tool for LLMs, targeting speed and memory efficiency
Created 1 year ago
Updated 14 hours ago
Feedback? Help us improve.