OLMoASR by allenai

Open-source speech recognition models

Created 2 years ago

470 stars

Top 64.7% on SourcePulse

Project Summary

This repository provides an open-source implementation of the Whisper ASR model, named OLMoASR. It offers a comprehensive pipeline for training robust speech recognition models from data processing to evaluation, targeting researchers and developers in the ASR domain. The project aims to provide open models and data for training, enabling reproducible and customizable ASR solutions.

How It Works

OLMoASR follows a data-centric approach, detailing steps for data preparation, including transcript formatting, audio segmentation, and language alignment. It utilizes ffmpeg for audio processing and wandb for experiment tracking. The training process leverages torchrun for distributed training, allowing for flexible configuration of model size, learning rates, batch sizes, and other hyperparameters. The models are trained on a large-scale audio-text dataset, OLMoASR-Mix.

Quick Start & Requirements

Installation: Clone the repository, set up a Python environment (>= 3.8), and install requirements:

git clone https://github.com/allenai/OLMoASR.git
pip install -r requirements/requirements.txt
pip install -e .

Prerequisites: ffmpeg, wandb.
Data: Download data from OLMoASR-Pool on Hugging Face and organize as specified.
Links: OLMoASR HuggingFace

Highlighted Details

Offers pre-trained models for both short-form and long-form speech recognition, with performance benchmarks (WER) provided for various datasets.
The data processing pipeline includes multiple stages for cleaning, segmenting, and filtering audio-text pairs, supporting language identification and alignment.
Training scripts are highly configurable, supporting distributed training (DDP, FSDP) with detailed options for optimization, logging, and evaluation.
Python API is available for transcription, with CLI support under development.

Maintenance & Community

Developed by a team including Huong Ngo, Matt Deitke, and Martijn Bartelds, with contributions from others.
Acknowledges assistance from OpenAI's Whisper code and resource support from Ai2 and UW.
No specific community links (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

The license is not explicitly stated in the provided README excerpt. Citing information is also pending.

Limitations & Caveats

CLI usage support is currently in development.
One of the evaluation datasets mentioned, Artie Bias Corpus, is noted as no longer available from its original source.
The README does not specify the exact license or provide details on commercial use compatibility.

Health Check

Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days