Large-scale distributed parallel training toolbox
Top 72.3% on sourcepulse
LiBai is a distributed parallel training toolbox for large-scale AI models, built on OneFlow. It targets researchers and engineers needing to train complex models efficiently across multiple devices and nodes, offering a flexible and modular framework for both Computer Vision and Natural Language Processing tasks.
How It Works
LiBai integrates multiple parallelism strategies (Data, Tensor, Pipeline) and training techniques (Mixed Precision, Activation Checkpointing, ZeRO) within a modular design. Its LazyConfig system allows for flexible syntax and structure, enabling users to build custom research projects or leverage its trainer and engine for streamlined development.
Quick Start & Requirements
Highlighted Details
Maintenance & Community
The project was last updated with Beta 0.3.0 on March 11, 2024. Community engagement is encouraged via contributions (see CONTRIBUTING). WeChat group access is available.
Licensing & Compatibility
Released under the Apache 2.0 license. This permissive license allows for commercial use and integration into closed-source projects.
Limitations & Caveats
The main branch is tied to OneFlow 0.7.0. Some models, like Stable Diffusion, are not yet fully supported for 3D parallel training.
2 days ago
1 day