parallelformers by tunib-ai

Toolkit for easy model parallelization

Created 4 years ago

791 stars

Top 44.5% on SourcePulse

View on GitHub

6 Experts Love This Project

Luca Soldaini

Research Scientist at Ai2

Edward Sun

Research Scientist at Meta Superintelligence Lab

Paras Jain

Cofounder of Genmo

Patrick von Platen

Author of Hugging Face Diffusers; Research Engineer at Mistral

and 2 more!

Project Summary

This library simplifies deploying large HuggingFace Transformer models across multiple GPUs for inference, enabling users to run models exceeding single-GPU memory capacity. It targets researchers and developers needing to deploy large language, vision, or speech models efficiently and cost-effectively.

How It Works

Parallelformers leverages model parallelism, inspired by Megatron-LM, to distribute model layers across available GPUs. This approach allows a single large model to be split and run across multiple devices, effectively increasing the total available memory and enabling inference for models that would otherwise be too large for a single GPU. The library automatically handles device placement and communication, abstracting away the complexities of manual model partitioning.

Quick Start & Requirements

Install via pip: pip install parallelformers
Requires PyTorch and HuggingFace Transformers.
Supports inference only.
See official documentation for more details.

Highlighted Details

Enables loading models larger than single GPU memory (e.g., 12GB model on two 8GB GPUs).
Supports a wide range of HuggingFace models, including BERT, GPT-2, RoBERTa, ViT, and Wav2Vec2.
Offers optional FP16 precision for memory savings and potential speedups.
Includes Docker support and guidance on shared memory configuration.

Maintenance & Community

Project initiated by Hyunwoong Ko.
Last updated October 2021 with Docker support.
Citation information provided.

Licensing & Compatibility

Licensed under the Apache License 2.0.
Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The library currently only supports inference, with no training capabilities. Some models like BigBird and ProphetNet are only partially supported, and others like SqueezeBERT are unsupported. The project's last update was in October 2021, indicating potential staleness.

Health Check

Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days