parallelformers  by tunib-ai

Toolkit for easy model parallelization

Created 4 years ago
790 stars

Top 44.5% on SourcePulse

GitHubView on GitHub
Project Summary

This library simplifies deploying large HuggingFace Transformer models across multiple GPUs for inference, enabling users to run models exceeding single-GPU memory capacity. It targets researchers and developers needing to deploy large language, vision, or speech models efficiently and cost-effectively.

How It Works

Parallelformers leverages model parallelism, inspired by Megatron-LM, to distribute model layers across available GPUs. This approach allows a single large model to be split and run across multiple devices, effectively increasing the total available memory and enabling inference for models that would otherwise be too large for a single GPU. The library automatically handles device placement and communication, abstracting away the complexities of manual model partitioning.

Quick Start & Requirements

  • Install via pip: pip install parallelformers
  • Requires PyTorch and HuggingFace Transformers.
  • Supports inference only.
  • See official documentation for more details.

Highlighted Details

  • Enables loading models larger than single GPU memory (e.g., 12GB model on two 8GB GPUs).
  • Supports a wide range of HuggingFace models, including BERT, GPT-2, RoBERTa, ViT, and Wav2Vec2.
  • Offers optional FP16 precision for memory savings and potential speedups.
  • Includes Docker support and guidance on shared memory configuration.

Maintenance & Community

  • Project initiated by Hyunwoong Ko.
  • Last updated October 2021 with Docker support.
  • Citation information provided.

Licensing & Compatibility

  • Licensed under the Apache License 2.0.
  • Permissive license suitable for commercial use and integration into closed-source applications.

Limitations & Caveats

The library currently only supports inference, with no training capabilities. Some models like BigBird and ProphetNet are only partially supported, and others like SqueezeBERT are unsupported. The project's last update was in October 2021, indicating potential staleness.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 30 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
11 more.

ctransformers by marella

0.1%
2k
Python bindings for fast Transformer model inference
Created 2 years ago
Updated 1 year ago
Starred by Alex Yu Alex Yu(Research Scientist at OpenAI; Former Cofounder of Luma AI) and Cody Yu Cody Yu(Coauthor of vLLM; MTS at OpenAI).

xDiT by xdit-project

0.7%
2k
Inference engine for parallel Diffusion Transformer (DiT) deployment
Created 1 year ago
Updated 1 day ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
4 more.

gemma_pytorch by google

0.2%
6k
PyTorch implementation for Google's Gemma models
Created 1 year ago
Updated 3 months ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
15 more.

FasterTransformer by NVIDIA

0.1%
6k
Optimized transformer library for inference
Created 4 years ago
Updated 1 year ago
Feedback? Help us improve.