parakeet-rs by altunenes

High-performance speech processing in Rust

Created 8 months ago

362 stars

Top 77.4% on SourcePulse

Project Summary

parakeet-rs provides a high-performance Rust library for speech-to-text (ASR) and speaker diarization, leveraging NVIDIA's Parakeet models via ONNX Runtime. It targets developers seeking fast, efficient, and streamable audio processing, offering significant speed advantages even on CPU, making it suitable for real-time applications and resource-constrained environments.

How It Works

The library integrates NVIDIA's Parakeet models, executing them through ONNX Runtime with support for various hardware acceleration providers (CUDA, TensorRT, WebGPU, DirectML, MIGraphx) and CPU fallback. It offers distinct models for different tasks: CTC for English ASR, TDT for multilingual ASR with auto-detection, EOU for streaming ASR with end-of-utterance detection, Nemotron for cache-aware streaming ASR, Multitalker for streaming multi-speaker ASR, and Sortformer for streaming speaker diarization. This modular approach allows for flexible deployment and optimized performance across diverse hardware.

Quick Start & Requirements

Installation typically involves adding parakeet-rs as a dependency in a Rust project (Cargo.toml) and enabling features like cuda if GPU acceleration is desired. Prerequisites: Rust toolchain, ONNX Runtime (with specific EPs for GPU acceleration), and separately downloaded NVIDIA Parakeet ONNX models and associated files (e.g., model.onnx, tokenizer.json) from HuggingFace. Audio input must be 16kHz mono WAV (16-

parakeet-rs by altunenes

Explore Similar Projects

kitten_tts_rs by second-state

onnx-asr by istupakov

Kitten-TTS-Server by devnen

edgedict by theblackcat102

CrispASR by CrispStrobe

aTrain by aTrainTranscription

Genie-TTS by High-Logic

TensorflowASR by Z-yq

GPA by AutoArk

voxtral.c by antirez

faster-qwen3-tts by andimarafioti

tortoise-tts by neonbjb