Discover and explore top open-source AI tools and projects—updated daily.
altunenesHigh-performance speech processing in Rust
Top 95.9% on SourcePulse
parakeet-rs provides a high-performance Rust library for speech-to-text (ASR) and speaker diarization, leveraging NVIDIA's Parakeet models via ONNX Runtime. It targets developers seeking fast, efficient, and streamable audio processing, offering significant speed advantages even on CPU, making it suitable for real-time applications and resource-constrained environments.
How It Works
The library integrates NVIDIA's Parakeet models, executing them through ONNX Runtime with support for various hardware acceleration providers (CUDA, TensorRT, WebGPU, DirectML, MIGraphx) and CPU fallback. It offers distinct models for different tasks: CTC for English ASR, TDT for multilingual ASR with auto-detection, EOU for streaming ASR with end-of-utterance detection, Nemotron for cache-aware streaming ASR, Multitalker for streaming multi-speaker ASR, and Sortformer for streaming speaker diarization. This modular approach allows for flexible deployment and optimized performance across diverse hardware.
Quick Start & Requirements
Installation typically involves adding parakeet-rs as a dependency in a Rust project (Cargo.toml) and enabling features like cuda if GPU acceleration is desired.
Prerequisites: Rust toolchain, ONNX Runtime (with specific EPs for GPU acceleration), and separately downloaded NVIDIA Parakeet ONNX models and associated files (e.g., model.onnx, tokenizer.json) from HuggingFace. Audio input must be 16kHz mono WAV (16-
1 week ago
Inactive
antirez
neonbjb