PyTorch extension for streamlined mixed precision & distributed training
Top 5.9% on sourcepulse
NVIDIA Apex provides PyTorch extensions for streamlined mixed-precision and distributed training, targeting researchers and engineers seeking to accelerate deep learning workflows. It offers utilities for automatic mixed precision (AMP) and optimized distributed data parallelism, aiming to simplify complex training configurations and improve performance.
How It Works
Apex historically offered apex.amp
for automatic mixed precision, simplifying the integration of FP16 training by modifying only a few lines of code. It also provided apex.parallel.DistributedDataParallel
for efficient multi-process distributed training, optimized with NVIDIA's NCCL communication library. While these specific modules are now deprecated in favor of upstream PyTorch equivalents, Apex continues to offer specialized fused kernels and optimized implementations for specific operations, particularly within its apex.contrib
modules.
Quick Start & Requirements
git clone https://github.com/NVIDIA/apex
cd apex
# For pip >= 23.1
pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
# Or for older pip
# pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
A Python-only build is available via pip install -v --no-cache-dir ./
.apex.contrib
modules may require newer CUDA versions (e.g., CUDA >= 11 for fused_weight_gradient_mlp_cuda
, cuDNN >= 8.5 for cudnn_gbn_lib
).--parallel
or --threads
flags can speed up compilation.Highlighted Details
apex.contrib
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
apex.contrib
modules may not support all PyTorch releases and might have specific dependency requirements (e.g., newer CUDA/cuDNN versions).3 days ago
1 day