Discover and explore top open-source AI tools and projects—updated daily.
CerebrasResearchSMoE LLM compression via novel expert pruning
Top 96.7% on SourcePulse
Summary
This repository implements Router-weighted Expert Activation Pruning (REAP), a method for compressing Sparsely-activated Mixture-of-Experts (SMoE) Large Language Models (LLMs). It addresses the memory overhead of SMoEs by pruning experts, offering a significant advantage over existing methods, particularly at 50% compression. REAP enables near-lossless compression for critical tasks like code generation and tool-calling, making it valuable for researchers and engineers working with large-scale SMoE models.
How It Works
REAP introduces a novel expert pruning criterion that evaluates an expert's contribution based on both router gate-values and average activation norms. This approach contrasts with expert merging, which the authors argue leads to irreducible error and functional subspace collapse by diminishing the router's independent modulation capabilities. By preserving the router's control over the remaining experts, REAP maintains a larger functional output space, resulting in superior compression performance.
Quick Start & Requirements
Installation can be done via a virtual environment using uv and scripts/build.sh, or through Docker with docker compose up --build -d. Configuration involves copying and populating .env.template and potentially specific WildBench configuration files. Adding new models requires updating src/reap/model_util.py with model-specific attribute names for SMoE components. Experiment execution scripts (merging-cli.sh, pruning-cli.sh) accept arguments for CUDA devices, model names, pruning/merging methods, compression ratios, and evaluation flags.
Highlighted Details
Maintenance & Community
No specific details regarding maintenance, community channels (e.g., Discord, Slack), or notable contributors were found in the provided README.
Licensing & Compatibility
The README does not specify the software license or provide compatibility notes for commercial use or closed-source linking.
Limitations & Caveats
No explicit limitations, known bugs, or alpha status were mentioned in the provided README text.
2 months ago
Inactive
evanmiller
huggingface
databricks