Collection of PyTorch performance optimization tricks
Top 33.5% on sourcepulse
This repository compiles practical techniques and optimizations for PyTorch users, focusing on accelerating training and reducing memory consumption. It targets researchers and engineers working with deep learning models who aim to improve efficiency and performance. The collection offers actionable advice, code snippets, and links to relevant resources for faster iteration and resource management.
How It Works
The project aggregates strategies across data loading, model design, training procedures, and code-level optimizations. It covers techniques like prefetching data, using efficient image processing libraries (OpenCV, DALI), consolidating data into single files (LMDB, TFRecord), and leveraging mixed-precision training (FP16, AMP). It also details memory-saving methods such as gradient accumulation, gradient checkpointing, and in-place operations.
Quick Start & Requirements
Highlighted Details
torch.no_grad()
, set_to_none=True
for zero_grad, and in-place operations are detailed.torch.backends.cudnn.benchmark = True
, pin_memory=True
, DistributedDataParallel
, mixed-precision training, and optimizing batch sizes.Maintenance & Community
The repository appears to be a personal collection with updates spanning from 2019 to 2024. The author encourages community suggestions. Links to Zhihu are provided for discussion.
Licensing & Compatibility
The repository does not explicitly state a license. The content is primarily a compilation of shared knowledge and links to external resources, implying a permissive use for learning and adaptation.
Limitations & Caveats
This is a curated collection of tips rather than a runnable library, requiring users to integrate the advice into their own projects. Some techniques, like gradient accumulation, may affect batch-size-dependent layers (e.g., BatchNorm). The effectiveness of certain optimizations can be workload-dependent.
1 year ago
Inactive