Toolkit for easy model parallelization
Top 45.3% on sourcepulse
This library simplifies deploying large HuggingFace Transformer models across multiple GPUs for inference, enabling users to run models exceeding single-GPU memory capacity. It targets researchers and developers needing to deploy large language, vision, or speech models efficiently and cost-effectively.
How It Works
Parallelformers leverages model parallelism, inspired by Megatron-LM, to distribute model layers across available GPUs. This approach allows a single large model to be split and run across multiple devices, effectively increasing the total available memory and enabling inference for models that would otherwise be too large for a single GPU. The library automatically handles device placement and communication, abstracting away the complexities of manual model partitioning.
Quick Start & Requirements
pip install parallelformers
Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The library currently only supports inference, with no training capabilities. Some models like BigBird and ProphetNet are only partially supported, and others like SqueezeBERT are unsupported. The project's last update was in October 2021, indicating potential staleness.
2 years ago
Inactive