whisper-at  by YuanGongND

Joint audio tagging and speech recognition model

created 2 years ago
400 stars

Top 73.4% on sourcepulse

GitHubView on GitHub
Project Summary

Whisper-AT enhances OpenAI's Whisper by adding audio event tagging capabilities with minimal computational overhead. It targets users needing both speech transcription and sound event detection, offering a unified solution that maintains Whisper's ASR performance while providing 527-class AudioSet labels.

How It Works

Whisper-AT freezes the original Whisper encoder and trains a novel Time- and Layer-wise Transformer (TL-TR) on top of its representations. This approach leverages Whisper's robust audio understanding for audio tagging, achieving significant performance gains with less than 1% additional computational cost compared to using separate models.

Quick Start & Requirements

  • Install via pip: pip install whisper-at
  • For Mac/Windows, use workaround: pip install numba numpy torch tqdm more-itertools tiktoken==0.3.3 then pip install --no-deps whisper-at
  • Requires ffmpeg.
  • Usage example:
import whisper_at as whisper
model = whisper.load_model("large-v1")
result = model.transcribe("audio.mp3", at_time_res=10)
print(result["text"])
audio_tag_result = whisper.parse_at_label(result, top_k=5)
print(audio_tag_result)

Highlighted Details

  • Achieves 42.1 mAP on AudioSet with the large-v1 model.
  • Supports all OpenAI Whisper model sizes.
  • Low-compute versions project AT model dimensions to 512 for reduced memory usage.
  • Maintains identical ASR performance and API to the original Whisper.

Maintenance & Community

  • Primary contact: yuangong@mit.edu. GitHub issues are preferred for questions.
  • Based on Interspeech 2023 paper.

Licensing & Compatibility

  • BSD license, similar to Whisper's MIT license.
  • Commercial use is permitted.

Limitations & Caveats

  • A known bug exists for Mac/Windows users, requiring specific installation steps.
  • The at_time_res parameter must be an integer multiple of 0.4 seconds.
Health Check
Last commit

1 year ago

Responsiveness

1 day

Pull Requests (30d)
1
Issues (30d)
0
Star History
23 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.