ocotillo  by neonbjb

Speech recognition model built on PyTorch

created 3 years ago
253 stars

Top 99.5% on sourcepulse

GitHubView on GitHub
Project Summary

Ocotillo provides a performant and user-friendly speech recognition solution, targeting developers and researchers who need accurate English transcription. It simplifies the process of integrating state-of-the-art speech-to-text capabilities into applications, offering significant speed improvements through TorchScript optimization.

How It Works

Ocotillo leverages a wav2vec2 model, specifically jbetker/wav2vec2-large-robust-ft-libritts-voxpopuli, fine-tuned for speech recognition and punctuation prediction. The core advantage lies in its TorchScript tracing, which compiles the PyTorch model into C++ for highly efficient inference. This approach minimizes overhead and maximizes processing speed, especially on GPUs.

Quick Start & Requirements

  • Install PyTorch: https://pytorch.org/get-started/locally/
  • Clone and install:
    git clone https://github.com/neonbjb/ocotillo.git
    cd ocotillo
    python setup.py install
    
  • Dependencies: PyTorch, Hugging Face Transformers. GPU (NVIDIA recommended) for optimal performance.
  • Demo Colab notebook: asr_demo.ipynb

Highlighted Details

  • Achieves 329x faster-than-realtime processing on an NVIDIA A5000 GPU with batch size 16.
  • Fine-tuned wav2vec2 model predicts punctuation for improved transcription quality.
  • Offers a simple CLI, batch processing script (transcribe.py), and a Python API (Transcriber class).
  • Includes an HTTP server compatible with Mycroft AI for remote speech-to-text services.

Maintenance & Community

  • Key contributions acknowledged from Patrick von Platen for Hugging Face API and blog posts.
  • No explicit community links (Discord/Slack) or roadmap provided in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is focused on English speech recognition and has not been tested on embedded hardware like the Raspberry Pi. The licensing status requires clarification for commercial adoption.

Health Check
Last commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
0 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.