pytorch-memonger  by Lyken17

PyTorch module for sublinear memory optimization

Created 6 years ago
603 stars

Top 54.3% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides a sublinear memory optimization for PyTorch deep learning models, targeting researchers and practitioners who need to train larger models or use larger batch sizes within limited GPU memory. It achieves this by reducing the memory footprint required for backpropagation from linear to square-root complexity.

How It Works

The core idea is to trade computation for memory by re-computing intermediate activations during the backward pass, rather than storing them all from the forward pass. This is implemented by replacing PyTorch's nn.Sequential with memonger.SublinearSequential. For models with non-deterministic layers like BatchNorm and Dropout, specific handling is included to maintain correctness, such as re-scaling BatchNorm momentum and memorizing RNG states for Dropout.

Quick Start & Requirements

  • Install via pip: pip install pytorch-memonger
  • Requires PyTorch.
  • Supports nn.Sequential models.

Highlighted Details

  • Reduces memory from O(N) to O(sqrt(N)) for backward pass.
  • Achieves significant memory savings: ResNet152 with batch size 16 uses 2455MiB vs. 5459MiB.
  • Supports BatchNorm and Dropout layers with specific handling for non-deterministic behavior.
  • Re-implementation of the "Training Deep Nets with Sublinear Memory Cost" paper.

Maintenance & Community

  • The project appears to be a personal implementation by Lyken17. No explicit community channels or roadmap are mentioned.

Licensing & Compatibility

  • The README does not specify a license.

Limitations & Caveats

  • Only supports nn.Sequential models due to PyTorch's define-by-run nature.
  • Requires careful consideration for models with non-deterministic layers, though common ones are addressed.
Health Check
Last Commit

5 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Patrick von Platen Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral), Yaowei Zheng Yaowei Zheng(Author of LLaMA-Factory), and
4 more.

ml-cross-entropy by apple

0.4%
520
PyTorch module for memory-efficient cross-entropy in LLMs
Created 10 months ago
Updated 1 day ago
Starred by Ying Sheng Ying Sheng(Coauthor of SGLang) and Stas Bekman Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm-analysis by cli99

0.4%
455
CLI tool for LLM latency/memory analysis during training/inference
Created 2 years ago
Updated 5 months ago
Starred by Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), Pawel Garbacki Pawel Garbacki(Cofounder of Fireworks AI), and
11 more.

Liger-Kernel by linkedin

0.6%
6k
Triton kernels for efficient LLM training
Created 1 year ago
Updated 1 day ago
Feedback? Help us improve.