Discover and explore top open-source AI tools and projects—updated daily.
Whisper finetuning and inference toolkit
Top 34.2% on SourcePulse
This repository provides tools and scripts for fine-tuning OpenAI's Whisper speech recognition model using LoRA. It supports training with or without timestamp data, and even without speech data, enabling customization for specific domains or languages. The project also offers accelerated inference options and deployment solutions for web, Windows desktop, and Android applications.
How It Works
The core of the project involves fine-tuning Whisper using the LoRA (Low-Rank Adaptation) technique, which allows for efficient adaptation of large pre-trained models with significantly fewer trainable parameters. This approach enables training on diverse datasets, including those lacking timestamp information or even speech content for specific tasks. For inference, it leverages CTranslate2 and GGML for accelerated performance, and integrates with Hugging Face's Transformers library for broader compatibility.
Quick Start & Requirements
pip install -r requirements.txt
(or use provided Docker image pytorch/pytorch:2.4.0-cuda11.8-cudnn9-devel
).bitsandbytes
from a specific GitHub release.aishell.py
script provided for processing the AIShell dataset.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
3 months ago
1 week