Whispering-LLaMA  by Srijith-rkr

Generative ASR error correction via cross-modal fusion

created 1 year ago
254 stars

Top 99.3% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a framework for generative Automatic Speech Recognition (ASR) error correction by fusing the Whisper audio encoder with the LLaMA language model decoder. It targets researchers and practitioners in ASR and NLP, offering improved accuracy over traditional methods by leveraging cross-modal integration.

How It Works

The core approach involves using the Whisper encoder to extract acoustic features from audio input. These features are then integrated into the LLaMA decoder, which is prompted with n-best hypotheses from an ASR system. This cross-modal fusion allows the LLM to predict a more accurate sentence, significantly improving Word Error Rate (WERR) by a claimed 28.83% to 37.66%. The system is designed for parameter efficiency, with only 7.97M trainable parameters.

Quick Start & Requirements

  • Install dependencies using conda env create -f environment.yml or pip install -r requirements.txt.
  • Requires pre-trained Alpaca weights and LLaMA tokenizer weights.
  • Official Hugging Face checkpoints are available.
  • Demo usage is shown in demo.py.
  • Training script: python training/WL-S.py with arguments for learning rate, GPU count, and data paths.

Highlighted Details

  • Accepted at EMNLP 2023 (Main Track).
  • Achieves improved WERR from 28.83% to 37.66%.
  • Parameter-efficient fine-tuning with 7.97M trainable parameters.
  • Utilizes a novel cross-modal fusion technique.

Maintenance & Community

  • Built upon lit-llama, stanford_alpaca, and Whisper.
  • Paper, slides, and HuggingFace checkpoints are linked.

Licensing & Compatibility

  • License details are not explicitly stated in the README.
  • Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The README mentions a paper is "[YET]" to be published, indicating potential for changes. Specific hardware requirements beyond GPU usage for training are not detailed.

Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
3 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), and
1 more.

yarn by jquesnelle

1.0%
2k
Context window extension method for LLMs (research paper, models)
created 2 years ago
updated 1 year ago
Feedback? Help us improve.