Discover and explore top open-source AI tools and projects—updated daily.
Noumena-NetworkHigh-performance MoE trainer for NVIDIA B200 GPUs
New!
Top 85.6% on SourcePulse
Noumena-Network/nmoe provides an opinionated Mixture-of-Experts (MoE) trainer specifically engineered for NVIDIA Blackwell B200 GPUs. It addresses the performance bottlenecks in traditional MoE training by implementing a novel expert parallelism strategy, RDEP, enabling efficient large-scale model training for researchers and power users.
How It Works
The core innovation is RDEP (Remote Direct Memory Access Event-driven Parallelism), which replaces standard NCCL all-to-all collectives for expert communication. Instead of global synchronization, RDEP dispatches tokens directly to expert owners using NVSHMEM for inter-node and CUDA IPC for intra-node communication. This direct put-based approach eliminates collective barriers and waiting, significantly improving communication efficiency and throughput for MoE layers.
Quick Start & Requirements
This repository is container-first, requiring Docker. The primary prerequisite is NVIDIA Blackwell B200 GPUs (sm_100a).
docker/Dockerfile.base and docker/Dockerfile.train.docker run --gpus all -v /data:/data xjdr/nmoe_train:latest python -m nmoe.train configs/moonlet.tomltorchrun --standalone --nproc_per_node=8 -m nmoe.train configs/moonlight.tomlxjdr/nmoe_dist:latest (requires NVSHMEM) and use k8s manifests..npy shards; preprocessing example: python -m nmoe.data.cli prep --source hf --dataset HuggingFaceFW/fineweb-edu --output /data/fineweb_edu --name fineweb_edu.nmoe/data/README.md, nviz/README.md.Highlighted Details
Maintenance & Community
The project adopts a narrow, opinionated stance, focusing on specific hardware and parallelism strategies. No explicit community channels (Discord/Slack) or roadmap links are provided in the README.
Licensing & Compatibility
Limitations & Caveats
Strictly limited to NVIDIA Blackwell B200 (sm_100a) hardware; no support for H100/A100 or fallback paths. Tensor parallelism is not implemented, and NCCL all-to-all is explicitly excluded from the MoE communication path.
1 week ago
Inactive
microsoft
InternLM
huggingface
NVIDIA
horovod
karpathy
NVIDIA
Lightning-AI