EPLB by deepseek-ai

Load balancer for expert-parallel MoE models

Created 9 months ago

1,314 stars

Top 30.3% on SourcePulse

View on GitHub

3 Experts Love This Project

Chaoyu Yang

Founder of Bento

Cody Yu

Coauthor of vLLM; MTS at OpenAI

Yineng Zhang

Inference Lead at SGLang; Research Scientist at Together AI

Project Summary

This repository provides the Expert Parallelism Load Balancer (EPLB), a tool designed to optimize the distribution of experts across GPUs in Mixture-of-Experts (MoE) models. It addresses the challenge of uneven expert workloads in expert parallelism (EP) by employing redundant expert replication and heuristic packing strategies, aiming to balance GPU loads and minimize inter-node communication for large-scale deployments.

How It Works

EPLB implements two load balancing policies: Hierarchical and Global. Hierarchical Load Balancing is used when the number of server nodes divides the number of expert groups, prioritizing grouping experts on the same node to reduce traffic. It first balances expert groups across nodes, then replicates experts within nodes, and finally packs replicas onto GPUs. Global Load Balancing is for other cases, globally replicating experts and packing them onto GPUs without explicit group consideration, suitable for larger EP sizes.

Quick Start & Requirements

Primary install: pip install eplb (assuming the package is published, otherwise direct import from the repo).
Prerequisites: PyTorch. The algorithm requires estimated expert loads, which are not provided by this repository.
Example usage: Provided in the README with a Python code snippet demonstrating eplb.rebalance_experts.

Highlighted Details

Implements both Hierarchical and Global load balancing policies for different EP configurations.
Supports redundant expert replication to mitigate load imbalance.
Aims to reduce inter-node traffic by co-locating experts from the same group.
Provides a clear Python interface for integrating the load balancing logic.

Maintenance & Community

Developed by deepseek-ai.
No specific community links (Discord, Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

License: MIT License.
Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The repository focuses solely on the load balancing algorithm and does not include mechanisms for predicting expert loads, which is a critical input for the algorithm. The effectiveness of the load balancing is dependent on the accuracy of these external load predictions.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

25 stars in the last 30 days