FastASR  by chenkui164

C++ ASR inference project for ARM platforms

Created 3 years ago
539 stars

Top 59.1% on SourcePulse

GitHubView on GitHub
Project Summary

FastASR is a C++-based Automatic Speech Recognition (ASR) inference engine designed for high performance and minimal dependencies, targeting developers and researchers needing efficient ASR on various platforms, including ARM devices like the Raspberry Pi 4B. It offers near-commercial-grade accuracy by leveraging optimized Transformer models trained on extensive datasets, providing a fast and accurate solution for speech-to-text tasks.

How It Works

This project implements ASR inference purely in C++, eschewing deep learning framework dependencies like PyTorch or TensorFlow. This approach allows for significant CPU optimization tailored to specific architectures, leading to high execution efficiency. By minimizing data copying and utilizing pointer-heavy algorithms, FastASR achieves faster inference speeds compared to framework-based solutions, particularly on resource-constrained devices. It supports both non-streaming and streaming models, with VAD technology enabling long audio processing for non-streaming variants.

Quick Start & Requirements

  • Install: pip install fastasr for Python users. Source compilation is also supported for C++ integration and custom builds.
  • Prerequisites: CPython 3.6-3.11, libfftw3, libopenblas. For Raspberry Pi 4B optimization, a 64-bit OS and recompilation of dependencies are recommended. Pre-trained models must be downloaded separately.
  • Setup: Installation via pip is straightforward. Compiling from source and downloading models may take longer depending on system resources and network speed.
  • Links: Example Usage

Highlighted Details

  • Supports four models: Paraformer, k2_rnnt2, conformer, and conformer_online (streaming).
  • Achieves real-time performance on ARM platforms like Raspberry Pi 4B.
  • Offers both C++ static library (libfastasr.a) and Python module (PyFastASR) interfaces.
  • Models are trained on WenetSpeech (10000+ hours) and private Alibaba datasets (60000+ hours).

Maintenance & Community

The project appears to be actively developed by chenkui164. Further community engagement channels are not explicitly listed in the README.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is still working on model quantization and compression. Punctuation addition requires a separate NLP model. The README notes that some models can be large and slow, potentially impacting client-side performance, though the C++ implementation aims to mitigate this.

Health Check
Last Commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.