parakeet.cpp  by mudler

High-performance C++ ASR inference

Created 2 weeks ago

New!

352 stars

Top 79.2% on SourcePulse

GitHubView on GitHub
Project Summary

Summary parakeet.cpp offers a C++17 inference engine for NVIDIA's NeMo Parakeet ASR models, built on ggml. It targets users needing efficient, dependency-light ASR without a Python runtime for inference. Key benefits include significantly faster CPU/GPU speeds than NeMo's PyTorch or whisper.cpp, byte-identical accuracy, and embeddable capabilities.

How It Works This project is a from-scratch C++17 port focused purely on inference, utilizing ggml for efficient tensor operations across CPU and GPU backends (CUDA, Metal, etc.). It supports various Parakeet architectures (CTC, RNNT, TDT, hybrid) and sizes, including multilingual and streaming variants. The design prioritizes speed and minimal dependencies, enabling deployment in resource-constrained environments or integration via a flat C API, ensuring transcript accuracy against original NeMo models.

Quick Start & Requirements

  • Primary install/run: Clone recursively (git clone --recursive https://github.com/mudler/parakeet.cpp), then build with CMake (cmake -B build && cmake --build build -j). Shared library build: -DPARAKEET_SHARED=ON.
  • Prerequisites: C++17 compiler, CMake. Python 3.x with torch (CPU) and nemo_toolkit[asr] is only for model conversion/validation. GGUF models are required for inference.
  • Links: Official repository: https://github.com/mudler/parakeet.cpp.

Highlighted Details

  • Performance: Faster than NeMo PyTorch (up to 1.7x CPU, 4.3x GPU) and significantly outperforms whisper.cpp.
  • Accuracy: All offline models validated at WER 0 vs NeMo. Streaming transcripts match NeMo byte-for-byte.
  • Model Support: CTC, RNNT, TDT, hybrid TDT-CTC families (110M-1.1B params), multilingual (25 European, 40+ locales streaming), and prompt-conditioned models.
  • Features: Cache-aware streaming with EOU detection. GGUF quantization (f16, q8_0, K-quants) for reduced size/memory.

Maintenance & Community Associated with the LocalAI team. Specific maintenance, community channels, or roadmap details were not explicitly provided in the README.

Licensing & Compatibility Codebase is MIT licensed. Model weights are governed by NVIDIA's original Parakeet licenses; review these for commercial use compatibility.

Limitations & Caveats Python is required solely for model conversion/validation, not inference. K-quantization requires the CLI tool post-conversion. GPU performance gains on specific models (e.g., pure CTC) may be less pronounced than NeMo due to ggml kernel reliance.

Health Check
Last Commit

1 day ago

Responsiveness

Inactive

Pull Requests (30d)
15
Issues (30d)
6
Star History
352 stars in the last 15 days

Explore Similar Projects

Starred by Nat Friedman Nat Friedman(Former CEO of GitHub), Alex Yu Alex Yu(Research Scientist at OpenAI; Cofounder of Luma AI), and
7 more.

ChatRWKV by BlinkDL

0.0%
9k
Open-source chatbot powered by the RWKV RNN language model
Created 3 years ago
Updated 2 weeks ago
Feedback? Help us improve.