pytorch-memonger by Lyken17

PyTorch module for sublinear memory optimization

Created 6 years ago

604 stars

Top 54.2% on SourcePulse

Project Summary

This repository provides a sublinear memory optimization for PyTorch deep learning models, targeting researchers and practitioners who need to train larger models or use larger batch sizes within limited GPU memory. It achieves this by reducing the memory footprint required for backpropagation from linear to square-root complexity.

How It Works

The core idea is to trade computation for memory by re-computing intermediate activations during the backward pass, rather than storing them all from the forward pass. This is implemented by replacing PyTorch's nn.Sequential with memonger.SublinearSequential. For models with non-deterministic layers like BatchNorm and Dropout, specific handling is included to maintain correctness, such as re-scaling BatchNorm momentum and memorizing RNG states for Dropout.

Quick Start & Requirements

Install via pip: pip install pytorch-memonger
Requires PyTorch.
Supports nn.Sequential models.

Highlighted Details

Reduces memory from O(N) to O(sqrt(N)) for backward pass.
Achieves significant memory savings: ResNet152 with batch size 16 uses 2455MiB vs. 5459MiB.
Supports BatchNorm and Dropout layers with specific handling for non-deterministic behavior.
Re-implementation of the "Training Deep Nets with Sublinear Memory Cost" paper.

Maintenance & Community

The project appears to be a personal implementation by Lyken17. No explicit community channels or roadmap are mentioned.

Licensing & Compatibility

The README does not specify a license.

Limitations & Caveats

Only supports nn.Sequential models due to PyTorch's define-by-run nature.
Requires careful consideration for models with non-deterministic layers, though common ones are addressed.

Health Check

Last Commit

6 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

0 stars in the last 30 days

Explore Similar Projects

InfLLM by thunlp

Research paper code for long-sequence LLM processing via training-free memory

Created 1 year ago

Updated 1 year ago

Starred by

Ying Sheng

Ying Sheng(Coauthor of SGLang) and

Stas Bekman

Stas Bekman(Author of "Machine Learning Engineering Open Book"; Research Engineer at Snowflake).

llm-analysis by cli99

CLI tool for LLM latency/memory analysis during training/inference

Created 2 years ago

Updated 8 months ago

omniserve by mit-han-lab

Unified inference engine for large-scale LLM serving

Created 1 year ago

Updated 10 months ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral),

Yaowei Zheng

Yaowei Zheng(Author of LLaMA-Factory), and

4 more.

ml-cross-entropy by apple

PyTorch module for memory-efficient cross-entropy in LLMs

Created 1 year ago

Updated 3 months ago

Starred by

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI),

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect), and

1 more.

gemma-2B-10M by mustafaaljadery

Gemma 2B with 10M context length using Infini-attention

Created 1 year ago

Updated 1 year ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI),

Phil Wang

Phil Wang(Prolific Research Paper Implementer), and

4 more.

koila by rentruewang

Tool to prevent CUDA out-of-memory errors in PyTorch

Created 4 years ago

Updated 1 month ago

memory_reduced_optimizer by adonis-dym

Research paper for memory-reduced deep network training

Created 1 year ago

Updated 11 months ago

Starred by

Pawel Garbacki

Pawel Garbacki(Cofounder of Fireworks AI),

Wing Lian

Wing Lian(Founder of Axolotl AI), and

2 more.

MeZO by princeton-nlp

Research paper implementation for memory-efficient LM fine-tuning

Created 2 years ago

Updated 2 years ago

Starred by

Luca Antiga

Luca Antiga(CTO of Lightning AI),

William Falcon

William Falcon(Founder of Lightning AI), and

4 more.

lightning-thunder by Lightning-AI

PyTorch compiler for model optimization via source-to-source transformation

Created 1 year ago

Updated 1 day ago

PyTorchTricks by lartpang

Collection of PyTorch performance optimization tricks

Created 6 years ago

Updated 1 year ago

Starred by

Vincent Weisser

Vincent Weisser(Cofounder of Prime Intellect),

Daniel Han

Daniel Han(Cofounder of Unsloth), and

1 more.

GaLore by jiaweizzhao

Memory-efficient training for large language models via gradient low-rank projection

Created 1 year ago

Updated 1 year ago

Starred by

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n),

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and

12 more.

Liger-Kernel by linkedin

Triton kernels for efficient LLM training

Created 1 year ago

Updated 4 days ago

Feedback? Help us improve.