Whisper-Finetune  by yeyupiaoling

Whisper finetuning and inference toolkit

created 2 years ago
1,101 stars

Top 35.3% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides tools and scripts for fine-tuning OpenAI's Whisper speech recognition model using LoRA. It supports training with or without timestamp data, and even without speech data, enabling customization for specific domains or languages. The project also offers accelerated inference options and deployment solutions for web, Windows desktop, and Android applications.

How It Works

The core of the project involves fine-tuning Whisper using the LoRA (Low-Rank Adaptation) technique, which allows for efficient adaptation of large pre-trained models with significantly fewer trainable parameters. This approach enables training on diverse datasets, including those lacking timestamp information or even speech content for specific tasks. For inference, it leverages CTranslate2 and GGML for accelerated performance, and integrates with Hugging Face's Transformers library for broader compatibility.

Quick Start & Requirements

  • Installation: pip install -r requirements.txt (or use provided Docker image pytorch/pytorch:2.4.0-cuda11.8-cudnn9-devel).
  • Prerequisites: Python 3.11, PyTorch 2.4.0, CUDA 11.8 (recommended), GPU (A100-PCIE-40GB used in examples). Windows users may need bitsandbytes from a specific GitHub release.
  • Data Preparation: Requires data in a JSON Lines format, with an aishell.py script provided for processing the AIShell dataset.
  • Resources: Fine-tuning requires significant GPU memory and compute. Inference acceleration options are available.
  • Documentation: Web Deployment, API Docs.

Highlighted Details

  • Supports fine-tuning Whisper models (tiny, base, small, medium, large, large-v2, large-v3).
  • Offers accelerated inference via CTranslate2 and GGML.
  • Enables deployment to Web (API server), Windows desktop, and Android applications.
  • Includes performance benchmarks showing significant speedups with various optimization techniques (Flash Attention 2, Compile, BetterTransformer).
  • Provides detailed character error rate (CER) and word error rate (WER) test tables for various models and datasets.

Maintenance & Community

  • Active development is implied by the inclusion of recent Whisper versions (large-v3).
  • Community discussion is encouraged via a knowledge planet and QQ group.

Licensing & Compatibility

  • The repository itself does not explicitly state a license. The underlying Whisper model is released under the MIT license.
  • Compatibility for commercial use depends on the licensing of the base Whisper model and any other dependencies.

Limitations & Caveats

  • Some model files and processed datasets are only available through the author's "knowledge planet," requiring potential payment or membership.
  • The README mentions that removing punctuation during evaluation might be necessary for accuracy, implying potential issues with punctuation handling in fine-tuned models.
Health Check
Last commit

1 month ago

Responsiveness

1 week

Pull Requests (30d)
0
Issues (30d)
4
Star History
86 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
19 more.

whisper by openai

0.4%
86k
Speech recognition model for multilingual transcription/translation
created 2 years ago
updated 1 month ago
Feedback? Help us improve.