CrispASR by CrispStrobe

Unified C++ speech engine for ASR and TTS

Created 4 months ago

467 stars

Top 64.3% on SourcePulse

Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> CrispASR provides a unified, high-performance C++ speech engine, consolidating numerous Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) models into a single, dependency-light binary. It targets engineers and power users seeking efficient speech processing capabilities without the overhead of Python environments. The project enables rapid deployment and experimentation across diverse models and languages via a consistent command-line interface and API.

How It Works

Originating as a fork of whisper.cpp, CrispASR utilizes the ggml C++ runtime to support a broad spectrum of ASR and TTS architectures, including models from OpenAI, NVIDIA, Mistral AI, Qwen, and IBM Granite. Its core innovation is a single C++ binary that auto-detects and loads various GGUF model formats, eliminating the need for separate Python installations or per-model dependencies. This approach streamlines integration and deployment for transcription, translation, speech synthesis, and alignment tasks.

Quick Start & Requirements

Build with CMake (cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc)). Requires a C++17 compiler and CMake 3.14+. GPU acceleration is supported via CMake flags (-DGGML_CUDA=ON, -DGGML_METAL=ON, -DGGML_VULKAN=ON). Crucially, CrispASR has zero runtime Python dependencies. Detailed installation and build instructions, including GPU backend configurations, are available in docs/install.md.

Highlighted Details

Supports over 26 ASR backends and 9 TTS engines with extensive multilingual capabilities.
Features a single C++ binary with no Python/PyTorch runtime dependencies.
Offers a CLI, an OpenAI-compatible HTTP server, and bindings for Python, Rust, and Dart.
Includes advanced functionalities like speaker diarization, hotword detection, universal forced alignment, speech translation, and Voice Activity Detection (VAD).
Models can be auto-downloaded using the -m auto flag.

Maintenance & Community

The provided documentation does not detail specific community channels (e.g., Discord, Slack), roadmaps, or notable contributors/sponsorships.

Licensing & Compatibility

The CrispASR binary is released under the MIT license, mirroring its whisper.cpp origins. Model weights are governed by their respective HuggingFace licenses, which are predominantly permissive (MIT, Apache-2.0, CC-BY-4.0), suggesting broad compatibility for commercial use, though individual model licenses require review.

Limitations & Caveats

Performance on CPU-only hardware for complex Speech-LLM backends can be significantly below real-time. Some features, such as specific VAD providers, are noted as experimental. The project actively lists ongoing development plans and feature additions, indicating a dynamic but potentially evolving API surface.

CrispASR by CrispStrobe

Explore Similar Projects

kitten_tts_rs by second-state

onnx-asr by istupakov

transcribe.cpp by handy-computer

Auralis by astramind-ai

RapidASR by RapidAI

audio.cpp by 0xShug0

TensorflowASR by Z-yq

sherpa by k2-fsa

QuickAgent by gkamradt

whisper-asr-webservice by ahmetoner

wenet by wenet-e2e

tortoise-tts by neonbjb