Finetuning walkthrough for Whisper ASR models
Top 60.2% on sourcepulse
This repository provides a streamlined walkthrough for fine-tuning OpenAI's Whisper model using Parameter-Efficient Fine-Tuning (PEFT) techniques, specifically LoRA. It targets users with consumer GPUs and limited VRAM, enabling faster training and significantly smaller checkpoint sizes compared to full fine-tuning, while maintaining comparable performance.
How It Works
The project leverages LoRA by injecting trainable rank decomposition matrices into Transformer layers, freezing the original model weights. This drastically reduces the number of trainable parameters (to <1%) and memory requirements. The approach uses bitsandbytes
for 8-bit quantization and accelerate
for distributed training, enabling fine-tuning of large models on hardware with as little as 8GB VRAM.
Quick Start & Requirements
pip install -q transformers datasets librosa evaluate jiwer gradio bitsandbytes==0.37 accelerate
and pip install -q git+https://github.com/huggingface/peft.git@main
Highlighted Details
pipeline
for inference.Maintenance & Community
Licensing & Compatibility
transformers
, datasets
, peft
), which are typically under Apache 2.0 or similar permissive licenses.Limitations & Caveats
max_steps=100
which should be removed for full training.predict_with_generate
and compute_metrics
within the Trainer
.1 year ago
1+ week