Load balancer for expert-parallel MoE models
Top 31.6% on SourcePulse
This repository provides the Expert Parallelism Load Balancer (EPLB), a tool designed to optimize the distribution of experts across GPUs in Mixture-of-Experts (MoE) models. It addresses the challenge of uneven expert workloads in expert parallelism (EP) by employing redundant expert replication and heuristic packing strategies, aiming to balance GPU loads and minimize inter-node communication for large-scale deployments.
How It Works
EPLB implements two load balancing policies: Hierarchical and Global. Hierarchical Load Balancing is used when the number of server nodes divides the number of expert groups, prioritizing grouping experts on the same node to reduce traffic. It first balances expert groups across nodes, then replicates experts within nodes, and finally packs replicas onto GPUs. Global Load Balancing is for other cases, globally replicating experts and packing them onto GPUs without explicit group consideration, suitable for larger EP sizes.
Quick Start & Requirements
pip install eplb
(assuming the package is published, otherwise direct import from the repo).eplb.rebalance_experts
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The repository focuses solely on the load balancing algorithm and does not include mechanisms for predicting expert loads, which is a critical input for the algorithm. The effectiveness of the load balancing is dependent on the accuracy of these external load predictions.
4 months ago
Inactive