whisper-at by YuanGongND

Joint audio tagging and speech recognition model

Created 2 years ago

412 stars

Top 70.9% on SourcePulse

Project Summary

Whisper-AT enhances OpenAI's Whisper by adding audio event tagging capabilities with minimal computational overhead. It targets users needing both speech transcription and sound event detection, offering a unified solution that maintains Whisper's ASR performance while providing 527-class AudioSet labels.

How It Works

Whisper-AT freezes the original Whisper encoder and trains a novel Time- and Layer-wise Transformer (TL-TR) on top of its representations. This approach leverages Whisper's robust audio understanding for audio tagging, achieving significant performance gains with less than 1% additional computational cost compared to using separate models.

Quick Start & Requirements

Install via pip: pip install whisper-at
For Mac/Windows, use workaround: pip install numba numpy torch tqdm more-itertools tiktoken==0.3.3 then pip install --no-deps whisper-at
Requires ffmpeg.
Usage example:

import whisper_at as whisper
model = whisper.load_model("large-v1")
result = model.transcribe("audio.mp3", at_time_res=10)
print(result["text"])
audio_tag_result = whisper.parse_at_label(result, top_k=5)
print(audio_tag_result)

Official Colab Tutorial: https://colab.research.google.com/github/YuanGongND/whisper-at/blob/main/tutorial.ipynb

Highlighted Details

Achieves 42.1 mAP on AudioSet with the large-v1 model.
Supports all OpenAI Whisper model sizes.
Low-compute versions project AT model dimensions to 512 for reduced memory usage.
Maintains identical ASR performance and API to the original Whisper.

Maintenance & Community

Primary contact: yuangong@mit.edu. GitHub issues are preferred for questions.
Based on Interspeech 2023 paper.

Licensing & Compatibility

BSD license, similar to Whisper's MIT license.
Commercial use is permitted.

Limitations & Caveats

A known bug exists for Mac/Windows users, requiring specific installation steps.
The at_time_res parameter must be an integer multiple of 0.4 seconds.

Health Check

Last Commit

1 year ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

2 stars in the last 30 days

Explore Similar Projects

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai).

insanely-fast-whisper-cli by ochen1

CLI tool for optimized, fast Whisper-based ASR

Created 2 years ago

Updated 1 year ago

LiveWhisper by Nikorasu

Live transcription tool using OpenAI's Whisper

Created 3 years ago

Updated 5 months ago

RuntimeSpeechRecognizer by gtreshchev

Unreal Engine plugin for real-time, offline speech recognition

Created 2 years ago

Updated 10 months ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

whispercpp by aarnphm

Pybind11 bindings for whisper.cpp

Created 2 years ago

Updated 1 year ago

whisper-ctranslate2 by Softcatala

CLI tool for faster Whisper transcription/translation

Created 2 years ago

Updated 4 weeks ago

use-whisper by chengsokdara

React hook for OpenAI Whisper API with speech recorder

Created 2 years ago

Updated 1 year ago

whisper-standalone-win by Purfview

Standalone executables for local speech transcription

Created 2 years ago

Updated 2 months ago

faster-whisper-GUI by CheshireCC

GUI for faster-whisper/whisperX transcription

Created 2 years ago

Updated 1 year ago

whisper_mic by mallorbc

Microphone interface for OpenAI's Whisper speech-to-text model

Created 3 years ago

Updated 1 year ago

Starred by

Jason Liu

Jason Liu(Author of Instructor).

stable-ts by jianfch

SDK for enhanced audio transcription using OpenAI's Whisper

Created 3 years ago

Updated 2 months ago

Starred by

Luis Capelo

Luis Capelo(Cofounder of Lightning AI) and

Matt Schrage

Matt Schrage(Cofounder of Fig).

WhisperLive by collabora

Real-time transcription app using OpenAI's Whisper

Created 2 years ago

Updated 3 months ago

whisper-asr-webservice by ahmetoner

ASR webservice API for speech recognition, translation, and language ID

Created 3 years ago

Updated 1 month ago

Feedback? Help us improve.