PruneMe by arcee-ai

LLM layer pruning for computational efficiency

Created 1 year ago

260 stars

Top 97.7% on SourcePulse

View on GitHub

1 Expert Loves This Project

Maxime Labonne

Head of Post-Training at Liquid AI

Project Summary

Automated identification and pruning of redundant layers in Large Language Models (LLMs) to significantly reduce computational costs during fine-tuning and inference. This project targets engineers and researchers working with LLMs, offering a practical method to achieve substantial resource savings with minimal performance degradation.

How It Works

The core approach involves analyzing layer similarity within an LLM using a specified dataset. By identifying blocks of layers exhibiting high redundancy, particularly in deeper model sections, the project leverages MergeKit to effectively prune these layers. Post-pruning, Parameter-Efficient Fine-Tuning (PEFT) techniques like QLoRA are employed to "heal" the model, recovering any performance loss and maintaining output quality. This strategy capitalizes on empirical findings that deeper LLM layers often contribute less uniquely than previously assumed.

Quick Start & Requirements

The primary workflow begins with computing layer similarity using a script within the compute_block_similarity directory. An example command is:

python layer_similarity.py --model_path "mistralai/Mistral-7B-Instruct-v0.2" \
                      --dataset "arcee-ai/sec-data-mini" \
                      --dataset_column "text" \
                      --batch_size 8 \
                      --max_length 1024 \
                      --layers_to_skip 8 \
                      --dataset_size 4000 \
                      --dataset_subset "train"

Prerequisites include a pre-trained LLM (e.g., Mistral-7B), a suitable dataset, and libraries such as MergeKit for pruning and PEFT for model healing. The pruned model can be found at arcee-ai/Mistral-7B-Instruct-v0.2-sliced-24-layer.

Highlighted Details

Empirical validation of layer redundancy in LLMs like Mistral-7B, supporting the "Unreasonable Ineffectiveness of the Deeper Layers" hypothesis.
Demonstrated ability of pruned models to generate coherent text and recover performance via PEFT methods like QLoRA.
Potential for efficient domain adaptation and merging of LLMs through Data Flow Space (DFS) Merging.

Maintenance & Community

This repository is marked as an "[unofficial]" implementation. No specific community channels, roadmap, or notable contributor information are provided in the README.

Licensing & Compatibility

The README does not explicitly state the project's license. This omission represents a significant caveat for potential adopters, especially concerning commercial use or integration into closed-source projects.

Limitations & Caveats

This is an "unofficial" implementation. The specific license is not stated in the README, posing a potential adoption blocker. Setup requires multiple distinct steps: similarity computation, pruning via MergeKit, and optional PEFT healing. The effectiveness of pruning may vary based on the model architecture and the chosen dataset.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

3 stars in the last 30 days