EPLB  by deepseek-ai

Load balancer for expert-parallel MoE models

created 5 months ago
1,244 stars

Top 31.6% on SourcePulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides the Expert Parallelism Load Balancer (EPLB), a tool designed to optimize the distribution of experts across GPUs in Mixture-of-Experts (MoE) models. It addresses the challenge of uneven expert workloads in expert parallelism (EP) by employing redundant expert replication and heuristic packing strategies, aiming to balance GPU loads and minimize inter-node communication for large-scale deployments.

How It Works

EPLB implements two load balancing policies: Hierarchical and Global. Hierarchical Load Balancing is used when the number of server nodes divides the number of expert groups, prioritizing grouping experts on the same node to reduce traffic. It first balances expert groups across nodes, then replicates experts within nodes, and finally packs replicas onto GPUs. Global Load Balancing is for other cases, globally replicating experts and packing them onto GPUs without explicit group consideration, suitable for larger EP sizes.

Quick Start & Requirements

  • Primary install: pip install eplb (assuming the package is published, otherwise direct import from the repo).
  • Prerequisites: PyTorch. The algorithm requires estimated expert loads, which are not provided by this repository.
  • Example usage: Provided in the README with a Python code snippet demonstrating eplb.rebalance_experts.

Highlighted Details

  • Implements both Hierarchical and Global load balancing policies for different EP configurations.
  • Supports redundant expert replication to mitigate load imbalance.
  • Aims to reduce inter-node traffic by co-locating experts from the same group.
  • Provides a clear Python interface for integrating the load balancing logic.

Maintenance & Community

  • Developed by deepseek-ai.
  • No specific community links (Discord, Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license allows for commercial use and integration into closed-source projects.

Limitations & Caveats

The repository focuses solely on the load balancing algorithm and does not include mechanisms for predicting expert loads, which is a critical input for the algorithm. The effectiveness of the load balancing is dependent on the accuracy of these external load predictions.

Health Check
Last commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
1
Star History
9 stars in the last 30 days

Explore Similar Projects

Starred by Anton Bukov Anton Bukov(Cofounder of 1inch Network), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
12 more.

exo by exo-explore

2.3%
30k
AI cluster for running models on diverse devices
created 1 year ago
updated 4 months ago
Starred by Philipp Moritz Philipp Moritz(Cofounder of Anyscale), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
10 more.

Made-With-ML by GokuMohandas

0.5%
42k
ML course for production-grade applications
created 6 years ago
updated 1 year ago
Feedback? Help us improve.