Whispering-LLaMA by Srijith-rkr

Generative ASR error correction via cross-modal fusion

Created 2 years ago

266 stars

Top 96.3% on SourcePulse

Project Summary

This project provides a framework for generative Automatic Speech Recognition (ASR) error correction by fusing the Whisper audio encoder with the LLaMA language model decoder. It targets researchers and practitioners in ASR and NLP, offering improved accuracy over traditional methods by leveraging cross-modal integration.

How It Works

The core approach involves using the Whisper encoder to extract acoustic features from audio input. These features are then integrated into the LLaMA decoder, which is prompted with n-best hypotheses from an ASR system. This cross-modal fusion allows the LLM to predict a more accurate sentence, significantly improving Word Error Rate (WERR) by a claimed 28.83% to 37.66%. The system is designed for parameter efficiency, with only 7.97M trainable parameters.

Quick Start & Requirements

Install dependencies using conda env create -f environment.yml or pip install -r requirements.txt.
Requires pre-trained Alpaca weights and LLaMA tokenizer weights.
Official Hugging Face checkpoints are available.
Demo usage is shown in demo.py.
Training script: python training/WL-S.py with arguments for learning rate, GPU count, and data paths.

Highlighted Details

Accepted at EMNLP 2023 (Main Track).
Achieves improved WERR from 28.83% to 37.66%.
Parameter-efficient fine-tuning with 7.97M trainable parameters.
Utilizes a novel cross-modal fusion technique.

Maintenance & Community

Built upon lit-llama, stanford_alpaca, and Whisper.
Paper, slides, and HuggingFace checkpoints are linked.

Licensing & Compatibility

License details are not explicitly stated in the README.
Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions a paper is "[YET]" to be published, indicating potential for changes. Specific hardware requirements beyond GPU usage for training are not detailed.

Whispering-LLaMA by Srijith-rkr

Explore Similar Projects

spear-tts-pytorch by lucidrains

pheme by PolyAI-LDN

VoiceStar by jasonppy

OSUM by ASLP-lab

GLM-ASR by zai-org

VITA-Audio by VITA-MLLM

whisper-finetune by vasistalodagala

SLAM-LLM by X-LANCE

vits_chinese by PlayVoice

vall-e by lifeiteng

distil-whisper by huggingface

parler-tts by huggingface