SVD-LLM  by AIoT-MLSys-Lab

Compressing LLMs with Singular Value Decomposition

Created 1 year ago
259 stars

Top 98.0% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

SVD-LLM addresses the challenge of compressing large language models (LLMs) by employing Singular Value Decomposition (SVD). It targets researchers and practitioners seeking to reduce model size and computational cost while maintaining performance, offering a novel truncation-aware approach.

How It Works

The core methodology involves applying truncation-aware SVD to LLM weight matrices. This is complemented by data whitening to prepare weights and sequential low-rank approximation (LoRA) fine-tuning to update compressed parameters. The framework also supports integration with quantization techniques like GPTQ for enhanced compression ratios.

Quick Start & Requirements

  • Installation: Requires Python 3.9 (newer versions are incompatible) and transformers package version 4.35.2. Set up a conda environment: conda create -n compress python=3.9, conda activate compress. Clone the repository and install dependencies: pip install -r requirements.txt.
  • Execution: A quick example bash compress_llama.sh compresses LLaMA-7B and runs evaluations. Step-by-step instructions detail SVD compression (python SVDLLM.py --step 1), LoRA fine-tuning (python LoRA.py), and integration with GPTQ (bash svdllm_gptq.sh). Evaluation scripts are also provided.
  • Resources: Links to ICLR 2025 and NAACL 2025 papers are available. The c4 dataset is used for evaluation.

Highlighted Details

  • Presents work accepted at ICLR 2025 and NAACL 2025.
  • Supports compression ratios up to 0.3 (30%), with a quick example targeting 20%.
  • Evaluates compressed models on perplexity and efficiency metrics.
  • Offers integration with GPTQ for further model size reduction.

Maintenance & Community

  • No specific details on maintainers, community channels (e.g., Discord/Slack), or roadmap are provided in the README.

Licensing & Compatibility

  • The README does not specify a software license. This omission requires clarification for adoption decisions, especially concerning commercial use or derivative works.

Limitations & Caveats

  • Strict dependency on Python 3.9 and transformers version 4.35.2 may pose integration challenges.
  • The absence of a declared license is a significant blocker for determining compatibility and usage rights.
  • The README does not detail hardware requirements beyond what might be inferred for LLM operations.
Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
6 stars in the last 30 days

Explore Similar Projects

Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
7 more.

llm-awq by mit-han-lab

0.2%
3k
Weight quantization research paper for LLM compression/acceleration
Created 2 years ago
Updated 3 months ago
Feedback? Help us improve.