Whisper fine-tuning scripts for ASR tasks
Top 85.5% on sourcepulse
This repository provides scripts for fine-tuning and evaluating OpenAI's Whisper models for Automatic Speech Recognition (ASR) on custom or Hugging Face datasets. It targets researchers and developers looking to adapt Whisper for specific languages, accents, or noisy audio conditions, enabling improved ASR performance.
How It Works
The project leverages the Hugging Face Transformers library to load and fine-tune various Whisper model configurations. It supports both Hugging Face datasets and custom datasets, which require a specific two-file format (audio_paths
and text
) for preparation. The scripts facilitate distributed training across multiple GPUs and offer options for hyperparameter tuning, including learning rate recommendations based on model size. An alternative faster evaluation path using whisper-jax
is also provided for improved inference speed.
Quick Start & Requirements
pip install -r requirements.txt
within a Python 3.8 virtual environment.git-lfs
is required for pushing models to Hugging Face.Highlighted Details
whisper-jax
integration.Maintenance & Community
The repository is maintained by vasistalodagala. Further community engagement or roadmap details are not explicitly mentioned in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the README. However, it relies heavily on the Hugging Face Transformers library, which is typically under the Apache 2.0 license, making it generally compatible with commercial use.
Limitations & Caveats
Audio segments processed for embedding extraction should not exceed 30 seconds due to Whisper's positional embedding limitations. The whisper-jax
integration requires models to have available Flax weights on Hugging Face.
2 years ago
Inactive