Speech recognition model built on PyTorch
Top 99.5% on sourcepulse
Ocotillo provides a performant and user-friendly speech recognition solution, targeting developers and researchers who need accurate English transcription. It simplifies the process of integrating state-of-the-art speech-to-text capabilities into applications, offering significant speed improvements through TorchScript optimization.
How It Works
Ocotillo leverages a wav2vec2 model, specifically jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli
, fine-tuned for speech recognition and punctuation prediction. The core advantage lies in its TorchScript tracing, which compiles the PyTorch model into C++ for highly efficient inference. This approach minimizes overhead and maximizes processing speed, especially on GPUs.
Quick Start & Requirements
git clone https://github.com/neonbjb/ocotillo.git
cd ocotillo
python setup.py install
asr_demo.ipynb
Highlighted Details
transcribe.py
), and a Python API (Transcriber
class).Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The project is focused on English speech recognition and has not been tested on embedded hardware like the Raspberry Pi. The licensing status requires clarification for commercial adoption.
3 years ago
Inactive