Technique for contrastive learning beyond GPU memory limits
Top 73.5% on sourcepulse
This repository provides Gradient Cache, a technique to scale contrastive learning batch sizes beyond GPU memory limits, enabling training on single GPUs that previously required multiple. It targets researchers and engineers working with large-scale contrastive learning models, offering significant cost and hardware efficiency benefits.
How It Works
Gradient Cache implements a memory-efficient gradient computation strategy by splitting large batches into smaller chunks. It performs forward and backward passes on these chunks sequentially, caching intermediate activations. These cached activations are then used to reconstruct the full-batch gradients, effectively simulating a much larger batch size without the memory overhead. This approach allows for training with significantly larger effective batch sizes on limited hardware.
Quick Start & Requirements
git clone https://github.com/luyug/GradCache && cd GradCache && pip install .
torch.cuda.amp.GradScaler
.Highlighted Details
Maintenance & Community
The project is associated with authors Luyu Gao, Yunyi Zhang, Jiawei Han, and Jamie Callan. Further community engagement channels are not explicitly mentioned in the README.
Licensing & Compatibility
The repository does not explicitly state a license in the provided README text. Users should verify licensing for commercial or closed-source use.
Limitations & Caveats
The README mentions that generic input types not explicitly handled may require a custom split_input_fn
. The effectiveness of chunk_sizes
depends on GPU memory utilization, requiring tuning.
1 year ago
1 day