varuna  by microsoft

Tool for efficient large DNN model training on commodity hardware

Created 4 years ago
252 stars

Top 99.6% on SourcePulse

GitHubView on GitHub
Project Summary

Varuna is a PyTorch library designed for efficient, scalable, and cost-effective training of large deep learning models on commodity hardware. It targets researchers and practitioners working with massive models that exceed the memory capacity of single GPUs, offering a solution that combines pipeline and data parallelism with dynamic resource adaptation.

How It Works

Varuna implements a hybrid parallelism strategy, interleaving pipeline parallelism (PP) and data parallelism (DP). Models are partitioned into sequential stages using CutPoint annotations within the model definition. These stages are then distributed across available GPUs. Data parallelism is applied across replicas of this pipeline. This approach allows for efficient utilization of memory and compute by breaking down large models and distributing them, while the hybrid nature aims to balance communication and computation overheads.

Quick Start & Requirements

  • Installation: Requires Python 3, PyTorch (1.5+), and Apex. Apex must be patched using the provided apex.patch before building.
    git clone https://github.com/NVIDIA/apex
    cp apex.patch /path/to/apex/
    cd /path/to/apex
    git apply apex.patch
    pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
    
    Then, install Varuna:
    git clone <varuna_repo>
    cd varuna
    python setup.py install
    
  • Prerequisites: PyTorch, Apex (with patch), Python 3.
  • Launch: Use run_varuna.py for distributed execution.
  • Docs: Available in the docs/ folder (html/index.html, varuna.pdf). Examples for BERT and Megatron-LM are in examples/.

Highlighted Details

  • Implements pipeline parallelism and data parallelism for large model training.
  • Supports dynamic resource scaling ("job morphing") via signal handling for checkpointing and relaunching.
  • Includes an auto-configuration module that profiles model/network performance to suggest optimal parallelism settings.
  • Handles FP16 mixed-precision training and parameter sharing across stages.

Maintenance & Community

  • Based on the paper "Varuna: Scalable, Low-cost Training of Massive Deep Learning Models" (EuroSys'22).
  • No explicit community links (Discord/Slack) or active contributor information provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. The dependency Apex is typically under a permissive license, but Varuna's own license requires verification.

Limitations & Caveats

  • Requires manual annotation of models with CutPoint instances.
  • The setup process for Apex patching can be fragile and version-dependent.
  • Job morphing requires user-implemented signal handlers in training scripts.
  • Auto-configuration requires a separate profiling step.
Health Check
Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
1 stars in the last 30 days

Explore Similar Projects

Starred by Tri Dao Tri Dao(Chief Scientist at Together AI), Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake), and
1 more.

oslo by tunib-ai

0%
309
Framework for large-scale transformer optimization
Created 3 years ago
Updated 3 years ago
Starred by Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), Yineng Zhang Yineng Zhang(Inference Lead at SGLang; Research Scientist at Together AI), and
1 more.

VeOmni by ByteDance-Seed

3.4%
1k
Framework for scaling multimodal model training across accelerators
Created 5 months ago
Updated 3 weeks ago
Feedback? Help us improve.