CrispASR  by CrispStrobe

Unified C++ speech engine for ASR and TTS

Created 2 months ago
284 stars

Top 92.1% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> CrispASR provides a unified, high-performance C++ speech engine, consolidating numerous Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) models into a single, dependency-light binary. It targets engineers and power users seeking efficient speech processing capabilities without the overhead of Python environments. The project enables rapid deployment and experimentation across diverse models and languages via a consistent command-line interface and API.

How It Works

Originating as a fork of whisper.cpp, CrispASR utilizes the ggml C++ runtime to support a broad spectrum of ASR and TTS architectures, including models from OpenAI, NVIDIA, Mistral AI, Qwen, and IBM Granite. Its core innovation is a single C++ binary that auto-detects and loads various GGUF model formats, eliminating the need for separate Python installations or per-model dependencies. This approach streamlines integration and deployment for transcription, translation, speech synthesis, and alignment tasks.

Quick Start & Requirements

Build with CMake (cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j$(nproc)). Requires a C++17 compiler and CMake 3.14+. GPU acceleration is supported via CMake flags (-DGGML_CUDA=ON, -DGGML_METAL=ON, -DGGML_VULKAN=ON). Crucially, CrispASR has zero runtime Python dependencies. Detailed installation and build instructions, including GPU backend configurations, are available in docs/install.md.

Highlighted Details

  • Supports over 26 ASR backends and 9 TTS engines with extensive multilingual capabilities.
  • Features a single C++ binary with no Python/PyTorch runtime dependencies.
  • Offers a CLI, an OpenAI-compatible HTTP server, and bindings for Python, Rust, and Dart.
  • Includes advanced functionalities like speaker diarization, hotword detection, universal forced alignment, speech translation, and Voice Activity Detection (VAD).
  • Models can be auto-downloaded using the -m auto flag.

Maintenance & Community

The provided documentation does not detail specific community channels (e.g., Discord, Slack), roadmaps, or notable contributors/sponsorships.

Licensing & Compatibility

The CrispASR binary is released under the MIT license, mirroring its whisper.cpp origins. Model weights are governed by their respective HuggingFace licenses, which are predominantly permissive (MIT, Apache-2.0, CC-BY-4.0), suggesting broad compatibility for commercial use, though individual model licenses require review.

Limitations & Caveats

Performance on CPU-only hardware for complex Speech-LLM backends can be significantly below real-time. Some features, such as specific VAD providers, are noted as experimental. The project actively lists ongoing development plans and feature additions, indicating a dynamic but potentially evolving API surface.

Health Check
Last Commit

15 hours ago

Responsiveness

Inactive

Pull Requests (30d)
27
Issues (30d)
48
Star History
112 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.