parakeet.cpp by mudler

High-performance C++ ASR inference

Created 2 weeks ago

New!

352 stars

Top 79.2% on SourcePulse

View on GitHub

2 Experts Love This Project

Project Summary

Summary parakeet.cpp offers a C++17 inference engine for NVIDIA's NeMo Parakeet ASR models, built on ggml. It targets users needing efficient, dependency-light ASR without a Python runtime for inference. Key benefits include significantly faster CPU/GPU speeds than NeMo's PyTorch or whisper.cpp, byte-identical accuracy, and embeddable capabilities.

How It Works This project is a from-scratch C++17 port focused purely on inference, utilizing ggml for efficient tensor operations across CPU and GPU backends (CUDA, Metal, etc.). It supports various Parakeet architectures (CTC, RNNT, TDT, hybrid) and sizes, including multilingual and streaming variants. The design prioritizes speed and minimal dependencies, enabling deployment in resource-constrained environments or integration via a flat C API, ensuring transcript accuracy against original NeMo models.

Quick Start & Requirements

Primary install/run: Clone recursively (git clone --recursive https://github.com/mudler/parakeet.cpp), then build with CMake (cmake -B build && cmake --build build -j). Shared library build: -DPARAKEET_SHARED=ON.
Prerequisites: C++17 compiler, CMake. Python 3.x with torch (CPU) and nemo_toolkit[asr] is only for model conversion/validation. GGUF models are required for inference.
Links: Official repository: https://github.com/mudler/parakeet.cpp.

Highlighted Details

Performance: Faster than NeMo PyTorch (up to 1.7x CPU, 4.3x GPU) and significantly outperforms whisper.cpp.
Accuracy: All offline models validated at WER 0 vs NeMo. Streaming transcripts match NeMo byte-for-byte.
Model Support: CTC, RNNT, TDT, hybrid TDT-CTC families (110M-1.1B params), multilingual (25 European, 40+ locales streaming), and prompt-conditioned models.
Features: Cache-aware streaming with EOU detection. GGUF quantization (f16, q8_0, K-quants) for reduced size/memory.

Maintenance & Community Associated with the LocalAI team. Specific maintenance, community channels, or roadmap details were not explicitly provided in the README.

Licensing & Compatibility Codebase is MIT licensed. Model weights are governed by NVIDIA's original Parakeet licenses; review these for commercial use compatibility.

Limitations & Caveats Python is required solely for model conversion/validation, not inference. K-quantization requires the CLI tool post-conversion. GPU performance gains on specific models (e.g., pure CTC) may be less pronounced than NeMo due to ggml kernel reliance.

Health Check

Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

352 stars in the last 15 days