Discover and explore top open-source AI tools and projects—updated daily.
deepseek-aiMoE load balancer optimizing expert workload distribution via linear programming
New!
Top 70.1% on SourcePulse
Summary Deepseek-ai/LPLB is an early-stage research project introducing a parallel load balancer for MoE models. It leverages linear programming to optimize expert workload distribution, targeting researchers and engineers working with MoE architectures. The primary benefit is mitigating dynamic load imbalances during training by intelligently reordering experts and assigning tokens.
How It Works
LPLB extends the Expert Parallelism Load Balancer (EPLB) by employing linear programming (LP) to dynamically rebalance token assignments on a per-batch basis. It formulates the load balancing problem to minimize imbalance within an expert-parallel group, while respecting edge capacities defined by token counts. Real-time workload statistics are synchronized efficiently using NVLINK and NVSHMEM, significantly reducing communication overhead compared to standard distributed primitives like torch.distributed.allreduce.
Quick Start & Requirements
./download-mathdx.sh, set NVSHMEM_DIR=..., then run pip install --no-build-isolation . or pip install --no-build-isolation --editable . for testing.Highlighted Details
torch.distributed.allreduce to minimize communication latency.r2o matrix.Maintenance & Community No specific details regarding notable contributors, sponsorships, partnerships, or community channels (e.g., Discord, Slack) are provided in the README.
Licensing & Compatibility The README does not specify a license type or provide compatibility notes relevant for commercial use or closed-source linking.
Limitations & Caveats The current planner optimizes for total token count, not the non-linear computational costs of grouped matrix multiplications, which may lead to suboptimal performance. Solver latency (~100 µs intra-node) can be non-negligible for small batches. Under extreme global load imbalance, LPLB may perform worse than EPLB due to differences in assigning redundant experts.
1 week ago
Inactive
microsoft
MoonshotAI
databricks
huggingface
huggingface