TransformerCompression  by microsoft

Transformer compression via SliceGPT (ICLR'24)

created 1 year ago
436 stars

Top 69.4% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides SliceGPT, a post-training sparsification technique for transformer models, including Large Language Models (LLMs). It enables users to reduce model size and improve inference speed by applying orthogonal transformations and slicing weight matrices, without altering the model's architecture. The primary audience includes researchers and practitioners seeking to optimize transformer deployments.

How It Works

SliceGPT applies orthogonal transformations to each transformer layer, followed by slicing off the least significant rows and columns of weight matrices based on eigenvalue decay. This process replaces dense weight matrices with smaller ones, reducing embedding dimensions and thus memory footprint and latency. The method is designed to maintain model performance while achieving significant compression.

Quick Start & Requirements

  • Install: pip install -e .[experiment]
  • Prerequisites: CUDA-enabled GPU, Python. Hugging Face authentication may be required for certain models.
  • Documentation: Hugging Face discussion

Highlighted Details

  • Reduces model size and memory footprint.
  • Achieves inference speedups without code optimization.
  • Supports recovery fine-tuning (RFT) for performance restoration.
  • Integrates with LM Eval Harness for evaluation.

Maintenance & Community

  • Developed by Microsoft.
  • Contributions are welcomed, subject to a Contributor License Agreement (CLA).
  • Follows the Microsoft Open Source Code of Conduct.

Licensing & Compatibility

  • The repository does not explicitly state a license in the provided README.

Limitations & Caveats

  • Adding support for new models requires implementing custom model and layer adapters.
  • Hugging Face authentication might be necessary for specific models.
Health Check
Last commit

6 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
15 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
7 more.

ctransformers by marella

0.1%
2k
Python bindings for fast Transformer model inference
created 2 years ago
updated 1 year ago
Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
2 more.

matmulfreellm by ridgerchu

0.1%
3k
MatMul-free language models
created 1 year ago
updated 1 week ago
Feedback? Help us improve.