koila by rentruewang

Tool to prevent CUDA out-of-memory errors in PyTorch

Created 4 years ago

1,833 stars

Top 23.4% on SourcePulse

View on GitHub

6 Experts Love This Project

Luis Capelo

Cofounder of Lightning AI

Phil Wang

Prolific Research Paper Implementer

Omar Sanseviero

DevRel at Google DeepMind

Alexander Borzunov

Research Scientist at OpenAI

and 2 more!

Project Summary

Koila is a Python library designed to prevent PyTorch's common "CUDA out of memory" errors with a single line of code. It targets PyTorch users, particularly those encountering memory limitations during model training, by automatically managing batch sizes and optimizing computation.

How It Works

Koila acts as a lightweight wrapper around PyTorch tensors. It employs a lazy evaluation strategy, similar to TensorFlow's static graphs, to build a computational graph before execution. By analyzing the shapes of intermediate tensors, Koila can predict memory requirements and dynamically adjust batch sizes to fit available GPU memory, preventing out-of-memory errors. It also automatically splits batches into powers of two for potential speedups.

Quick Start & Requirements

Install via pip: pip install koila
Requires PyTorch.
Refer to the v0.1.1 tag for a proof-of-concept.

Highlighted Details

Prevents CUDA out-of-memory errors with a single line of code.
Automatically accumulates gradients for large batch sizes.
Lazily evaluates PyTorch code to save computing power.
Splits batch dimensions automatically for GPU efficiency.

Maintenance & Community

The project is currently undergoing a significant re-structure, with the main branch being largely empty. The v0.1.1 tag represents a working proof-of-concept. The project is available under the Apache License.

Licensing & Compatibility

License: Apache License.
Compatible with PyTorch code.

Limitations & Caveats

The library is a work in progress and not yet fully PyTorch compatible due to limited development time. It is not recommended for production environments. The main branch is mostly empty due to ongoing re-structuring.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

1 stars in the last 30 days