parallelformers  by tunib-ai

Toolkit for easy model parallelization

created 4 years ago
790 stars

Top 45.3% on sourcepulse

GitHubView on GitHub
Project Summary

This library simplifies deploying large HuggingFace Transformer models across multiple GPUs for inference, enabling users to run models exceeding single-GPU memory capacity. It targets researchers and developers needing to deploy large language, vision, or speech models efficiently and cost-effectively.

How It Works

Parallelformers leverages model parallelism, inspired by Megatron-LM, to distribute model layers across available GPUs. This approach allows a single large model to be split and run across multiple devices, effectively increasing the total available memory and enabling inference for models that would otherwise be too large for a single GPU. The library automatically handles device placement and communication, abstracting away the complexities of manual model partitioning.

Quick Start & Requirements

  • Install via pip: pip install parallelformers
  • Requires PyTorch and HuggingFace Transformers.
  • Supports inference only.
  • See official documentation for more details.

Highlighted Details

  • Enables loading models larger than single GPU memory (e.g., 12GB model on two 8GB GPUs).
  • Supports a wide range of HuggingFace models, including BERT, GPT-2, RoBERTa, ViT, and Wav2Vec2.
  • Offers optional FP16 precision for memory savings and potential speedups.
  • Includes Docker support and guidance on shared memory configuration.

Maintenance & Community

  • Project initiated by Hyunwoong Ko.
  • Last updated October 2021 with Docker support.
  • Citation information provided.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The library currently only supports inference, with no training capabilities. Some models like BigBird and ProphetNet are only partially supported, and others like SqueezeBERT are unsupported. The project's last update was in October 2021, indicating potential staleness.

Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
7 stars in the last 90 days

Explore Similar Projects

Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 1 day ago
Feedback? Help us improve.