moonshine by moonshine-ai

Speech-to-text models optimized for fast, accurate ASR on edge devices

Created 1 year ago

3,067 stars

Top 15.5% on SourcePulse

2 Experts Love This Project

hammer

Jeff Hammerbacher

Cofounder of Cloudera

jph00

Cofounder of fast.ai

Project Summary

Moonshine is a family of automatic speech recognition (ASR) models designed for fast and accurate transcription on resource-constrained edge devices. It targets real-time applications like live captioning and voice commands, offering competitive word-error rates (WER) compared to similarly sized OpenAI Whisper models.

How It Works

Moonshine employs an architecture optimized for efficient processing of audio, notably handling input audio segments dynamically rather than fixed 30-second chunks. This approach allows for significantly faster processing of shorter audio inputs, with a claimed 5x speed improvement over Whisper for 10-second segments while maintaining comparable or better accuracy.

Quick Start & Requirements

Installation: Install via pip: uv pip install useful-moonshine (for Torch, TensorFlow, JAX) or uv pip install useful-moonshine-onnx (for ONNX).
Prerequisites: Python 3.x. GPU with CUDA is recommended for faster inference, especially with JAX.
Usage: moonshine.transcribe('audio.wav', 'moonshine/tiny')
Models: Available via HuggingFace Hub (UsefulSensors/moonshine-tiny, UsefulSensors/moonshine-base).
Demos: Live captions, browser demo available.
Docs: Blog, Paper, Model Card

Highlighted Details

Achieves lower WER than Whisper models of comparable size on several OpenASR leaderboard datasets.
Processes audio dynamically, leading to faster inference for shorter audio segments.
Models are available in PyTorch, TensorFlow, JAX, and ONNX runtimes.
Tiny model is ~190MB, Base model is ~400MB.

Maintenance & Community

Models are available on HuggingFace.
Active development indicated by TODO list including MLX support and fine-tuning code.

Licensing & Compatibility

License not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently supports English only.
CTranslate2 support is pending a pull request merge. MLX support is listed as a future item.

Health Check

Last Commit

1 month ago

Responsiveness

1 day

Pull Requests (30d)

1

Issues (30d)

2

Star History

43 stars in the last 30 days

Explore Similar Projects

csm-mlx by senstella

Speech model for Apple Silicon using MLX

Created 10 months ago

Updated 4 months ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI).

S.A.T.U.R.D.A.Y by GRVYDEV

Vocal computing toolbox for building voice interfaces to LLMs

Created 2 years ago

Updated 2 years ago

edgedict by theblackcat102

RNN-Transducer for online speech recognition

Created 5 years ago

Updated 4 years ago

LiveWhisper by Nikorasu

Live transcription tool using OpenAI's Whisper

Created 3 years ago

Updated 5 months ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

whisper.rn by mybigday

React Native binding for high-performance local speech recognition

Created 2 years ago

Updated 1 month ago

Starred by

Travis Fischer

Travis Fischer(Founder of Agentic).

ollama-voice-mac by apeatling

Offline voice assistant for macOS

Created 2 years ago

Updated 4 months ago

Starred by

Jeff Hammerbacher

Jeff Hammerbacher(Cofounder of Cloudera),

Theo Browne

Theo Browne(Founder of Ping.gg), and

1 more.

dia2 by nari-labs

Streaming dialogue TTS for real-time conversational audio

Created 1 month ago

Updated 1 month ago

Starred by

Jong Wook Kim

Jong Wook Kim(Research Scientist at OpenAI).

realtime-transcription-fastrtc by sofdog-gh

Real-time transcription tool using local Whisper models

Created 10 months ago

Updated 6 months ago

Starred by

Georgi Gerganov

Georgi Gerganov(Author of llama.cpp, whisper.cpp).

whisper.net by sandrohanea

.NET library for speech-to-text using Whisper models

Created 2 years ago

Updated 1 week ago

openWakeWord by dscripka

Open-source wakeword detection library for voice-enabled apps

Created 3 years ago

Updated 1 week ago

Starred by

Amin Ahmad

Amin Ahmad(Cofounder of Vectara) and

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai).

whisper_streaming by ufal

Real-time streaming for long speech-to-text transcription/translation

Created 2 years ago

Updated 2 months ago

sherpa-onnx by k2-fsa

Speech toolkit for local, offline speech AI tasks via ONNX

Created 3 years ago

Updated 11 hours ago

Feedback? Help us improve.