Discover and explore top open-source AI tools and projects—updated daily.
OpenBMBTraining toolkit for large AI models
Top 53.6% on SourcePulse
BMTrain is an open-source toolkit designed for efficient large-scale model training, including pre-training and fine-tuning. It targets researchers and engineers working with models containing tens of billions of parameters, simplifying distributed training to feel like stand-alone development.
How It Works
BMTrain integrates with PyTorch, enabling distributed training through its init_distributed function, replacing PyTorch's native distributed module. It implements ZeRO optimization by requiring users to replace torch.nn.Module with bmtrain.DistributedModule and torch.nn.Parameter with bmtrain.DistributedParameter. Transformer blocks can be further optimized by wrapping them in bmtrain.Block with specified ZeRO levels. Communication overhead is reduced by using bmtrain.TransformerBlockList for sequential blocks.
Quick Start & Requirements
pip install bmtrain (compiles C/CUDA source code, may take 5-10 minutes).bmt.init_distributed(), replace PyTorch modules with BMTrain equivalents, and launch using torch.distributed.launch or torchrun.Highlighted Details
bmtrain.optim.AdamOffloadOptimizer and bmtrain.lr_scheduler for optimized training.OptimManager to handle optimizer zero-grad, backward, clipping, and step operations.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
BMTrain makes deep modifications to PyTorch's internals, potentially leading to unexpected behavior. Users are advised to submit issues for any observed problems.
2 months ago
1 week
huggingface
PrimeIntellect-ai
jiaweizzhao
bigscience-workshop
huggingface
NVIDIA
Lightning-AI