OLMoASR  by allenai

Open-source speech recognition models

Created 1 year ago
427 stars

Top 69.2% on SourcePulse

GitHubView on GitHub
Project Summary

This repository provides an open-source implementation of the Whisper ASR model, named OLMoASR. It offers a comprehensive pipeline for training robust speech recognition models from data processing to evaluation, targeting researchers and developers in the ASR domain. The project aims to provide open models and data for training, enabling reproducible and customizable ASR solutions.

How It Works

OLMoASR follows a data-centric approach, detailing steps for data preparation, including transcript formatting, audio segmentation, and language alignment. It utilizes ffmpeg for audio processing and wandb for experiment tracking. The training process leverages torchrun for distributed training, allowing for flexible configuration of model size, learning rates, batch sizes, and other hyperparameters. The models are trained on a large-scale audio-text dataset, OLMoASR-Mix.

Quick Start & Requirements

  • Installation: Clone the repository, set up a Python environment (>= 3.8), and install requirements:
    git clone https://github.com/allenai/OLMoASR.git
    pip install -r requirements/requirements.txt
    pip install -e .
    
  • Prerequisites: ffmpeg, wandb.
  • Data: Download data from OLMoASR-Pool on Hugging Face and organize as specified.
  • Links: OLMoASR HuggingFace

Highlighted Details

  • Offers pre-trained models for both short-form and long-form speech recognition, with performance benchmarks (WER) provided for various datasets.
  • The data processing pipeline includes multiple stages for cleaning, segmenting, and filtering audio-text pairs, supporting language identification and alignment.
  • Training scripts are highly configurable, supporting distributed training (DDP, FSDP) with detailed options for optimization, logging, and evaluation.
  • Python API is available for transcription, with CLI support under development.

Maintenance & Community

  • Developed by a team including Huong Ngo, Matt Deitke, and Martijn Bartelds, with contributions from others.
  • Acknowledges assistance from OpenAI's Whisper code and resource support from Ai2 and UW.
  • No specific community links (Discord/Slack) or roadmap are mentioned in the README.

Licensing & Compatibility

  • The license is not explicitly stated in the provided README excerpt. Citing information is also pending.

Limitations & Caveats

  • CLI usage support is currently in development.
  • One of the evaluation datasets mentioned, Artie Bias Corpus, is noted as no longer available from its original source.
  • The README does not specify the exact license or provide details on commercial use compatibility.
Health Check
Last Commit

3 days ago

Responsiveness

Inactive

Pull Requests (30d)
11
Issues (30d)
2
Star History
425 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.