ocotillo by neonbjb

Speech recognition model built on PyTorch

Created 4 years ago

254 stars

Top 99.1% on SourcePulse

Project Summary

Ocotillo provides a performant and user-friendly speech recognition solution, targeting developers and researchers who need accurate English transcription. It simplifies the process of integrating state-of-the-art speech-to-text capabilities into applications, offering significant speed improvements through TorchScript optimization.

How It Works

Ocotillo leverages a wav2vec2 model, specifically jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli, fine-tuned for speech recognition and punctuation prediction. The core advantage lies in its TorchScript tracing, which compiles the PyTorch model into C++ for highly efficient inference. This approach minimizes overhead and maximizes processing speed, especially on GPUs.

Quick Start & Requirements

Install PyTorch: https://pytorch.org/get-started/locally/

Clone and install:

git clone https://github.com/neonbjb/ocotillo.git
cd ocotillo
python setup.py install

Dependencies: PyTorch, Hugging Face Transformers. GPU (NVIDIA recommended) for optimal performance.
Demo Colab notebook: asr_demo.ipynb

Highlighted Details

Achieves 329x faster-than-realtime processing on an NVIDIA A5000 GPU with batch size 16.
Fine-tuned wav2vec2 model predicts punctuation for improved transcription quality.
Offers a simple CLI, batch processing script (transcribe.py), and a Python API (Transcriber class).
Includes an HTTP server compatible with Mycroft AI for remote speech-to-text services.

Maintenance & Community

Key contributions acknowledged from Patrick von Platen for Hugging Face API and blog posts.
No explicit community links (Discord/Slack) or roadmap provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is focused on English speech recognition and has not been tested on embedded hardware like the Raspberry Pi. The licensing status requires clarification for commercial adoption.

ocotillo by neonbjb

Explore Similar Projects

speech-recognition-uk by egorsmkv

orate by haydenbleasel

reverb by revdotcom

echogarden by echogarden-project

edgedict by theblackcat102

AudioToText by Carleslc

pywhispercpp by absadiki

ichigo by janhq

whisper-plus by kadirnar

pocketsphinx.js by syl22-00

whisperX by m-bain

whisper by openai