fast-whisper-finetuning by Vaibhavs10

Finetuning walkthrough for Whisper ASR models

Created 2 years ago

554 stars

Top 57.7% on SourcePulse

1 Expert Loves This Project

osanseviero

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This repository provides a streamlined walkthrough for fine-tuning OpenAI's Whisper model using Parameter-Efficient Fine-Tuning (PEFT) techniques, specifically LoRA. It targets users with consumer GPUs and limited VRAM, enabling faster training and significantly smaller checkpoint sizes compared to full fine-tuning, while maintaining comparable performance.

How It Works

The project leverages LoRA by injecting trainable rank decomposition matrices into Transformer layers, freezing the original model weights. This drastically reduces the number of trainable parameters (to <1%) and memory requirements. The approach uses bitsandbytes for 8-bit quantization and accelerate for distributed training, enabling fine-tuning of large models on hardware with as little as 8GB VRAM.

Quick Start & Requirements

Install: pip install -q transformers datasets librosa evaluate jiwer gradio bitsandbytes==0.37 accelerate and pip install -q git+https://github.com/huggingface/peft.git@main
Prerequisites: Python, Hugging Face Hub token, CUDA-enabled GPU (tested on T4 with 8GB VRAM).
Setup: Requires downloading datasets (e.g., Common Voice 13.0) and accepting terms on Hugging Face Hub. Data preparation involves resampling audio and tokenizing text.
Docs: Hugging Face PEFT

Highlighted Details

Fine-tunes Whisper-large-v2 on <8GB VRAM.
Achieves comparable performance to full fine-tuning with <1% trainable parameters.
Checkpoint sizes are <1% of the original model (~60MB).
Integrates seamlessly with 🤗 Transformers pipeline for inference.

Maintenance & Community

Developed by Vaibhavs10.
Mentions Hugging Face and its PEFT library.
Encourages sharing results on Twitter (@huggingface, @reach_vb).

Licensing & Compatibility

The repository itself does not explicitly state a license.
Relies on Hugging Face libraries (transformers, datasets, peft), which are typically under Apache 2.0 or similar permissive licenses.
Compatible with commercial use as long as underlying Hugging Face library licenses are respected.

Limitations & Caveats

The provided Colab notebook is for demonstration and includes max_steps=100 which should be removed for full training.
INT8 training requires specific handling for predict_with_generate and compute_metrics within the Trainer.
Forced decoder IDs are necessary during inference to ensure correct language decoding.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

1 stars in the last 30 days

Explore Similar Projects

Aurora by WangRongsheng

Code for a research paper on instruction-tuning a Chinese chat model

Created 2 years ago

Updated 1 year ago

TeleChat2 by Tele-AI

Chinese large language model series trained on domestic hardware

Created 1 year ago

Updated 5 months ago

OSUM by ASLP-lab

Open Speech Understanding Model research paper

Created 11 months ago

Updated 1 month ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai).

whisper-medusa by aiola-lab

ASR optimization via multi-head decoding

Created 1 year ago

Updated 5 months ago

TensorflowASR by Z-yq

ASR toolkit for CPU/edge deployment, approaching GPU model performance

Created 6 years ago

Updated 10 months ago

MASR by yeyupiaoling

PyTorch ASR framework for streaming and non-streaming speech recognition

Created 5 years ago

Updated 3 weeks ago

chatglm_finetuning by ssbuild

Fine-tuning scripts for ChatGLM models

Created 2 years ago

Updated 10 months ago

Whisper-Finetune by yeyupiaoling

Whisper finetuning and inference toolkit

Created 2 years ago

Updated 3 weeks ago

Starred by

Lianmin Zheng

Lianmin Zheng(Coauthor of SGLang, vLLM),

Simon Willison

Simon Willison(Coauthor of Django), and

9 more.

CTranslate2 by OpenNMT

Fast inference engine for Transformer models

Created 6 years ago

Updated 14 hours ago

Starred by

Dan Guido

Dan Guido(Cofounder of Trail of Bits),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

4 more.

faster-whisper by SYSTRAN

Faster Whisper reimplementation using CTranslate2

Created 2 years ago

Updated 1 month ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Georgios Konstantopoulos

Georgios Konstantopoulos(CTO, General Partner at Paradigm), and

19 more.

peft by huggingface

Parameter-efficient fine-tuning (PEFT) library

Created 3 years ago

Updated 2 days ago

Starred by

Tobi Lutke

Tobi Lutke(Cofounder of Shopify),

Mckay Wrigley

Mckay Wrigley(Founder of Takeoff AI), and

20 more.

whisper.cpp by ggml-org

C/C++ port for high-performance Whisper ASR inference

Created 3 years ago

Updated 6 days ago

Feedback? Help us improve.