Tool for efficient large DNN model training on commodity hardware
Top 99.8% on sourcepulse
Varuna is a PyTorch library designed for efficient, scalable, and cost-effective training of large deep learning models on commodity hardware. It targets researchers and practitioners working with massive models that exceed the memory capacity of single GPUs, offering a solution that combines pipeline and data parallelism with dynamic resource adaptation.
How It Works
Varuna implements a hybrid parallelism strategy, interleaving pipeline parallelism (PP) and data parallelism (DP). Models are partitioned into sequential stages using CutPoint
annotations within the model definition. These stages are then distributed across available GPUs. Data parallelism is applied across replicas of this pipeline. This approach allows for efficient utilization of memory and compute by breaking down large models and distributing them, while the hybrid nature aims to balance communication and computation overheads.
Quick Start & Requirements
apex.patch
before building.
git clone https://github.com/NVIDIA/apex
cp apex.patch /path/to/apex/
cd /path/to/apex
git apply apex.patch
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Then, install Varuna:
git clone <varuna_repo>
cd varuna
python setup.py install
run_varuna.py
for distributed execution.docs/
folder (html/index.html
, varuna.pdf
). Examples for BERT and Megatron-LM are in examples/
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
CutPoint
instances.1 year ago
Inactive