moonshine  by moonshine-ai

Speech-to-text models optimized for fast, accurate ASR on edge devices

created 10 months ago
2,803 stars

Top 17.4% on sourcepulse

GitHubView on GitHub
1 Expert Loves This Project
Project Summary

Moonshine is a family of automatic speech recognition (ASR) models designed for fast and accurate transcription on resource-constrained edge devices. It targets real-time applications like live captioning and voice commands, offering competitive word-error rates (WER) compared to similarly sized OpenAI Whisper models.

How It Works

Moonshine employs an architecture optimized for efficient processing of audio, notably handling input audio segments dynamically rather than fixed 30-second chunks. This approach allows for significantly faster processing of shorter audio inputs, with a claimed 5x speed improvement over Whisper for 10-second segments while maintaining comparable or better accuracy.

Quick Start & Requirements

  • Installation: Install via pip: uv pip install useful-moonshine (for Torch, TensorFlow, JAX) or uv pip install useful-moonshine-onnx (for ONNX).
  • Prerequisites: Python 3.x. GPU with CUDA is recommended for faster inference, especially with JAX.
  • Usage: moonshine.transcribe('audio.wav', 'moonshine/tiny')
  • Models: Available via HuggingFace Hub (UsefulSensors/moonshine-tiny, UsefulSensors/moonshine-base).
  • Demos: Live captions, browser demo available.
  • Docs: Blog, Paper, Model Card

Highlighted Details

  • Achieves lower WER than Whisper models of comparable size on several OpenASR leaderboard datasets.
  • Processes audio dynamically, leading to faster inference for shorter audio segments.
  • Models are available in PyTorch, TensorFlow, JAX, and ONNX runtimes.
  • Tiny model is ~190MB, Base model is ~400MB.

Maintenance & Community

  • Models are available on HuggingFace.
  • Active development indicated by TODO list including MLX support and fine-tuning code.

Licensing & Compatibility

  • License not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

  • Currently supports English only.
  • CTranslate2 support is pending a pull request merge. MLX support is listed as a future item.
Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
122 stars in the last 90 days

Explore Similar Projects

Starred by Jeff Hammerbacher Jeff Hammerbacher(Cofounder of Cloudera), Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems), and
2 more.

MiniCPM-o by OpenBMB

0.2%
20k
MLLM for vision, speech, and multimodal live streaming on your phone
created 1 year ago
updated 1 month ago
Feedback? Help us improve.