PyTorchTricks by lartpang

Collection of PyTorch performance optimization tricks

Created 6 years ago

1,193 stars

Top 32.6% on SourcePulse

Project Summary

This repository compiles practical techniques and optimizations for PyTorch users, focusing on accelerating training and reducing memory consumption. It targets researchers and engineers working with deep learning models who aim to improve efficiency and performance. The collection offers actionable advice, code snippets, and links to relevant resources for faster iteration and resource management.

How It Works

The project aggregates strategies across data loading, model design, training procedures, and code-level optimizations. It covers techniques like prefetching data, using efficient image processing libraries (OpenCV, DALI), consolidating data into single files (LMDB, TFRecord), and leveraging mixed-precision training (FP16, AMP). It also details memory-saving methods such as gradient accumulation, gradient checkpointing, and in-place operations.

Quick Start & Requirements

Installation: No explicit installation instructions are provided; it's a collection of tips and code snippets.
Prerequisites: PyTorch, Python, and potentially libraries like OpenCV, DALI, Apex, and MMCV are implied.
Resources: Setup involves integrating the described techniques into existing PyTorch workflows.
Links:
- Original Document: https://www.yuque.com/lart/ugkv9f/ugysgn
- Zhihu Answers: https://www.zhihu.com/question/358632497, https://www.zhihu.com/question/358632497

Highlighted Details

Data Loading Acceleration: Strategies include pre-processing, GPU-based augmentation (DALI), faster image decoding (jpeg4py), and data consolidation (LMDB, Tar).
Memory Optimization: Techniques like gradient accumulation, gradient checkpointing, torch.no_grad(), set_to_none=True for zero_grad, and in-place operations are detailed.
Training Speed-ups: Recommendations cover torch.backends.cudnn.benchmark = True, pin_memory=True, DistributedDataParallel, mixed-precision training, and optimizing batch sizes.
Model Design: Insights into ShuffleNetV2 and Vision Transformer design principles for efficiency are provided.

Maintenance & Community

The repository appears to be a personal collection with updates spanning from 2019 to 2024. The author encourages community suggestions. Links to Zhihu are provided for discussion.

Licensing & Compatibility

The repository does not explicitly state a license. The content is primarily a compilation of shared knowledge and links to external resources, implying a permissive use for learning and adaptation.

Limitations & Caveats

This is a curated collection of tips rather than a runnable library, requiring users to integrate the advice into their own projects. Some techniques, like gradient accumulation, may affect batch-size-dependent layers (e.g., BatchNorm). The effectiveness of certain optimizations can be workload-dependent.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

0 stars in the last 30 days