Discover and explore top open-source AI tools and projects—updated daily.
FrikalloHigh-performance C++ speech AI inference engine
Top 98.2% on SourcePulse
Summary
parakeet.cpp delivers an ultra-fast, portable C++ implementation for on-device speech recognition, leveraging NVIDIA's Parakeet models. It targets developers and power users seeking high-performance Automatic Speech Recognition (ASR) without the overhead of heavy runtimes like Python or ONNX. The project offers significant speedups, particularly on Apple Silicon GPUs, by utilizing its custom axiom tensor library for Metal acceleration.
How It Works
The project is built on a pure C++ architecture centered around axiom, a lightweight tensor library featuring automatic Metal GPU acceleration. It employs a shared FastConformer encoder and supports diverse decoders, including CTC, TDT, and RNNT, alongside specialized streaming models. This design bypasses traditional dependencies, enabling efficient on-device inference through optimized Metal GPU operations and FP16 support for reduced memory footprint and enhanced speed.
Quick Start & Requirements
git clone --recursive), build with make build.axiom (included), safetensors, torch (for weight conversion), dr_libs, stb_vorbis (audio handling).nvidia/parakeet-tdt_ctc-110m) and converting them using provided Python scripts (scripts/convert_nemo.py).Highlighted Details
axiom.Maintenance & Community
The project's roadmap is detailed within the README, indicating active development focus. No specific community channels (e.g., Discord, Slack) or notable contributors/sponsorships are mentioned.
Licensing & Compatibility
Limitations & Caveats
GPU acceleration is currently limited to Apple Silicon hardware running macOS 13+. Offline models have approximate audio length limits of 4-5 minutes; longer audio requires using the dedicated streaming models. Model conversion from HuggingFace .nemo files to the project's .safetensors format is a necessary prerequisite.
4 weeks ago
Inactive
kensho-technologies
antirez