fast-whisper-finetuning  by Vaibhavs10

Finetuning walkthrough for Whisper ASR models

created 2 years ago
533 stars

Top 60.2% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

This repository provides a streamlined walkthrough for fine-tuning OpenAI's Whisper model using Parameter-Efficient Fine-Tuning (PEFT) techniques, specifically LoRA. It targets users with consumer GPUs and limited VRAM, enabling faster training and significantly smaller checkpoint sizes compared to full fine-tuning, while maintaining comparable performance.

How It Works

The project leverages LoRA by injecting trainable rank decomposition matrices into Transformer layers, freezing the original model weights. This drastically reduces the number of trainable parameters (to <1%) and memory requirements. The approach uses bitsandbytes for 8-bit quantization and accelerate for distributed training, enabling fine-tuning of large models on hardware with as little as 8GB VRAM.

Quick Start & Requirements

  • Install: pip install -q transformers datasets librosa evaluate jiwer gradio bitsandbytes==0.37 accelerate and pip install -q git+https://github.com/huggingface/peft.git@main
  • Prerequisites: Python, Hugging Face Hub token, CUDA-enabled GPU (tested on T4 with 8GB VRAM).
  • Setup: Requires downloading datasets (e.g., Common Voice 13.0) and accepting terms on Hugging Face Hub. Data preparation involves resampling audio and tokenizing text.
  • Docs: Hugging Face PEFT

Highlighted Details

  • Fine-tunes Whisper-large-v2 on <8GB VRAM.
  • Achieves comparable performance to full fine-tuning with <1% trainable parameters.
  • Checkpoint sizes are <1% of the original model (~60MB).
  • Integrates seamlessly with 🤗 Transformers pipeline for inference.

Maintenance & Community

  • Developed by Vaibhavs10.
  • Mentions Hugging Face and its PEFT library.
  • Encourages sharing results on Twitter (@huggingface, @reach_vb).

Licensing & Compatibility

  • The repository itself does not explicitly state a license.
  • Relies on Hugging Face libraries (transformers, datasets, peft), which are typically under Apache 2.0 or similar permissive licenses.
  • Compatible with commercial use as long as underlying Hugging Face library licenses are respected.

Limitations & Caveats

  • The provided Colab notebook is for demonstration and includes max_steps=100 which should be removed for full training.
  • INT8 training requires specific handling for predict_with_generate and compute_metrics within the Trainer.
  • Forced decoder IDs are necessary during inference to ensure correct language decoding.
Health Check
Last commit

1 year ago

Responsiveness

1+ week

Pull Requests (30d)
0
Issues (30d)
0
Star History
21 stars in the last 90 days

Explore Similar Projects

Starred by Tobi Lutke Tobi Lutke(Cofounder of Shopify), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
10 more.

qlora by artidoro

0.2%
11k
Finetuning tool for quantized LLMs
created 2 years ago
updated 1 year ago
Feedback? Help us improve.