Discover and explore top open-source AI tools and projects—updated daily.
Open-source speech recognition models
Top 69.2% on SourcePulse
This repository provides an open-source implementation of the Whisper ASR model, named OLMoASR. It offers a comprehensive pipeline for training robust speech recognition models from data processing to evaluation, targeting researchers and developers in the ASR domain. The project aims to provide open models and data for training, enabling reproducible and customizable ASR solutions.
How It Works
OLMoASR follows a data-centric approach, detailing steps for data preparation, including transcript formatting, audio segmentation, and language alignment. It utilizes ffmpeg
for audio processing and wandb
for experiment tracking. The training process leverages torchrun
for distributed training, allowing for flexible configuration of model size, learning rates, batch sizes, and other hyperparameters. The models are trained on a large-scale audio-text dataset, OLMoASR-Mix.
Quick Start & Requirements
git clone https://github.com/allenai/OLMoASR.git
pip install -r requirements/requirements.txt
pip install -e .
ffmpeg
, wandb
.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
3 days ago
Inactive