apex  by NVIDIA

PyTorch extension for streamlined mixed precision & distributed training

created 7 years ago
8,749 stars

Top 5.9% on sourcepulse

GitHubView on GitHub
Project Summary

NVIDIA Apex provides PyTorch extensions for streamlined mixed-precision and distributed training, targeting researchers and engineers seeking to accelerate deep learning workflows. It offers utilities for automatic mixed precision (AMP) and optimized distributed data parallelism, aiming to simplify complex training configurations and improve performance.

How It Works

Apex historically offered apex.amp for automatic mixed precision, simplifying the integration of FP16 training by modifying only a few lines of code. It also provided apex.parallel.DistributedDataParallel for efficient multi-process distributed training, optimized with NVIDIA's NCCL communication library. While these specific modules are now deprecated in favor of upstream PyTorch equivalents, Apex continues to offer specialized fused kernels and optimized implementations for specific operations, particularly within its apex.contrib modules.

Quick Start & Requirements

  • Installation: Recommended installation from source with CUDA and C++ extensions:
    git clone https://github.com/NVIDIA/apex
    cd apex
    # For pip >= 23.1
    pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --config-settings "--build-option=--cpp_ext" --config-settings "--build-option=--cuda_ext" ./
    # Or for older pip
    # pip install -v --disable-pip-version-check --no-cache-dir --no-build-isolation --global-option="--cpp_ext" --global-option="--cuda_ext" ./
    
    A Python-only build is available via pip install -v --no-cache-dir ./.
  • Prerequisites: CUDA Toolkit, C++ compiler (GCC/Clang), Ninja (recommended for faster compilation). Specific apex.contrib modules may require newer CUDA versions (e.g., CUDA >= 11 for fused_weight_gradient_mlp_cuda, cuDNN >= 8.5 for cudnn_gbn_lib).
  • Resources: Compilation can be resource-intensive. Building with --parallel or --threads flags can speed up compilation.
  • Documentation: https://nvidia.github.io/apex

Highlighted Details

  • Provides fused kernels for performance-critical operations (e.g., LayerNorm, Adam, multi-head attention).
  • Includes specialized implementations for distributed training optimizations beyond standard PyTorch.
  • Offers utilities for synchronized batch normalization and custom loss functions within apex.contrib.
  • Supports installation via NVIDIA NGC PyTorch containers.

Maintenance & Community

  • Maintained by NVIDIA.
  • Some components are deprecated in favor of upstream PyTorch.
  • Check the repository for community contributions and potential forks.

Licensing & Compatibility

  • License: BSD-3-Clause.
  • Compatible with commercial and closed-source applications.

Limitations & Caveats

  • Core AMP and DistributedDataParallel utilities are deprecated, with users advised to migrate to PyTorch's native implementations.
  • apex.contrib modules may not support all PyTorch releases and might have specific dependency requirements (e.g., newer CUDA/cuDNN versions).
  • Windows support is experimental and may require building PyTorch from source.
Health Check
Last commit

3 days ago

Responsiveness

1 day

Pull Requests (30d)
7
Issues (30d)
3
Star History
131 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jaret Burkett Jaret Burkett(Founder of Ostris), and
1 more.

nunchaku by nunchaku-tech

2.1%
3k
High-performance 4-bit diffusion model inference engine
created 8 months ago
updated 14 hours ago
Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
6 more.

FasterTransformer by NVIDIA

0.2%
6k
Optimized transformer library for inference
created 4 years ago
updated 1 year ago
Starred by Aravind Srinivas Aravind Srinivas(Cofounder of Perplexity), Stas Bekman Stas Bekman(Author of Machine Learning Engineering Open Book; Research Engineer at Snowflake), and
12 more.

DeepSpeed by deepspeedai

0.2%
40k
Deep learning optimization library for distributed training and inference
created 5 years ago
updated 1 day ago
Feedback? Help us improve.