Whisper-Finetune by yeyupiaoling

Whisper finetuning and inference toolkit

Created 2 years ago

1,175 stars

Top 33.0% on SourcePulse

Project Summary

This repository provides tools and scripts for fine-tuning OpenAI's Whisper speech recognition model using LoRA. It supports training with or without timestamp data, and even without speech data, enabling customization for specific domains or languages. The project also offers accelerated inference options and deployment solutions for web, Windows desktop, and Android applications.

How It Works

The core of the project involves fine-tuning Whisper using the LoRA (Low-Rank Adaptation) technique, which allows for efficient adaptation of large pre-trained models with significantly fewer trainable parameters. This approach enables training on diverse datasets, including those lacking timestamp information or even speech content for specific tasks. For inference, it leverages CTranslate2 and GGML for accelerated performance, and integrates with Hugging Face's Transformers library for broader compatibility.

Quick Start & Requirements

Installation: pip install -r requirements.txt (or use provided Docker image pytorch/pytorch:2.4.0-cuda11.8-cudnn9-devel).
Prerequisites: Python 3.11, PyTorch 2.4.0, CUDA 11.8 (recommended), GPU (A100-PCIE-40GB used in examples). Windows users may need bitsandbytes from a specific GitHub release.
Data Preparation: Requires data in a JSON Lines format, with an aishell.py script provided for processing the AIShell dataset.
Resources: Fine-tuning requires significant GPU memory and compute. Inference acceleration options are available.
Documentation: Web Deployment, API Docs.

Highlighted Details

Supports fine-tuning Whisper models (tiny, base, small, medium, large, large-v2, large-v3).
Offers accelerated inference via CTranslate2 and GGML.
Enables deployment to Web (API server), Windows desktop, and Android applications.
Includes performance benchmarks showing significant speedups with various optimization techniques (Flash Attention 2, Compile, BetterTransformer).
Provides detailed character error rate (CER) and word error rate (WER) test tables for various models and datasets.

Maintenance & Community

Active development is implied by the inclusion of recent Whisper versions (large-v3).
Community discussion is encouraged via a knowledge planet and QQ group.

Licensing & Compatibility

The repository itself does not explicitly state a license. The underlying Whisper model is released under the MIT license.
Compatibility for commercial use depends on the licensing of the base Whisper model and any other dependencies.

Limitations & Caveats

Some model files and processed datasets are only available through the author's "knowledge planet," requiring potential payment or membership.
The README mentions that removing punctuation during evaluation might be necessary for accuracy, implying potential issues with punctuation handling in fine-tuned models.

Health Check

Last Commit

3 weeks ago

Responsiveness

1 week

Pull Requests (30d)

0

Issues (30d)

0

Star History

11 stars in the last 30 days

Explore Similar Projects

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI).

VoiceStar by jasonppy

Robust, duration-controllable TTS that extrapolates

Created 9 months ago

Updated 7 months ago

OrionStar-Yi-34B-Chat by OrionStarAI

Chat model for conversational tasks in both Chinese and English

Created 2 years ago

Updated 1 year ago

Starred by

Maxime Labonne

Maxime Labonne(Head of Post-Training at Liquid AI).

dataspeech by huggingface

Suite of scripts for tagging speech datasets, especially for TTS model development

Created 1 year ago

Updated 1 year ago

whisper-finetune by vasistalodagala

Whisper fine-tuning scripts for ASR tasks

Created 2 years ago

Updated 2 years ago

Starred by

Didier Lopes

Didier Lopes(Founder of OpenBB) and

Sindre Sorhus

Sindre Sorhus(Prolific OSS Developer).

awesome-whisper by sindresorhus

Curated list for OpenAI's Whisper ASR system

Created 2 years ago

Updated 2 months ago

whisper-ctranslate2 by Softcatala

CLI tool for faster Whisper transcription/translation

Created 2 years ago

Updated 4 weeks ago

xtts-webui by daswer123

WebUI for XTTS, a text-to-speech model, and fine-tuning

Created 2 years ago

Updated 11 months ago

chatglm_finetuning by ssbuild

Fine-tuning scripts for ChatGLM models

Created 2 years ago

Updated 10 months ago

vall-e by lifeiteng

PyTorch for zero-shot text-to-speech synthesis, re-implementing VALL-E

Created 3 years ago

Updated 4 months ago

Starred by

Dan Guido

Dan Guido(Cofounder of Trail of Bits),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

4 more.

faster-whisper by SYSTRAN

Faster Whisper reimplementation using CTranslate2

Created 2 years ago

Updated 1 month ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Piotr Dąbkowski

Piotr Dąbkowski(Cofounder of ElevenLabs), and

2 more.

PaddleSpeech by PaddlePaddle

Speech toolkit for ASR, TTS, speaker verification, translation, and keyword spotting

Created 8 years ago

Updated 2 months ago

Starred by

Aakanksha Chowdhery

Aakanksha Chowdhery(Author of PaLM; Research Scientist at Reflection AI; Adjunct Professor at Stanford) and

Tim J. Baek

Tim J. Baek(Founder of Open WebUI).

F5-TTS by SWivid

Speech model for fluent, faithful speech with flow matching

Created 1 year ago

Updated 2 weeks ago

Feedback? Help us improve.