Memory-efficient optimizer for large language model finetuning
Top 96.5% on SourcePulse
BAdam offers a memory-efficient alternative to full-parameter fine-tuning of large language models by applying Adam's update rule to small parameter blocks sequentially. This approach significantly reduces memory requirements, enabling fine-tuning of models like Llama 3-8B on a single RTX3090, while achieving competitive or superior performance compared to LoRA.
How It Works
BAdam implements block coordinate optimization by iterating through partitions of the model's parameters (e.g., individual transformer layers). For a specified number of updates (switch_block_every
), it applies the Adam optimizer only to the current active block, keeping other parameters frozen. This sequential block processing drastically lowers the peak memory usage for optimizer states and gradients. The library provides flexibility in defining these blocks, from entire layers to specific matrix modules, and supports model parallelism via DeepSpeed ZeRO-3 for distributed training.
Quick Start & Requirements
pip install badam
git clone https://github.com/Ledzy/BAdam.git && cd BAdam && pip install -e .
conda create -n badam python=3.10 && conda activate badam && pip install -r requirements.txt
bf16=True
), and potentially NVIDIA GPUs (tested with RTX3090).Highlighted Details
switch_block_every
hyperparameter suggestion: min(max(n/(BD), 50), 100)
.Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
BlockOptimizerRatio
is under active development and currently only supports Adam updates, with potential overhead from gradient sparsification.5 months ago
Inactive