Fun-ASR by FunAudioLLM

Advanced speech recognition toolkit for global audio

Created 6 months ago

1,379 stars

Top 28.6% on SourcePulse

View on GitHub

1 Expert Loves This Project

Jeremy Howard

Cofounder of fast.ai

Project Summary

Fun-ASR is an end-to-end large speech recognition model from Tongyi Lab, designed for high-precision, multi-language transcription. It targets developers and researchers needing robust ASR capabilities, especially in challenging environments or for specialized domains, offering benefits like low-latency real-time transcription and extensive dialect/accent support.

How It Works

The system employs an end-to-end architecture trained on tens of millions of hours of real speech data. It features specialized optimizations for far-field, high-noise scenarios, achieving up to 93% accuracy. Novel aspects include deep support for 7 Chinese dialects and 26 regional accents, alongside recognition for 31 languages with mixed-language capabilities, and enhanced performance for music background lyric transcription.

Quick Start & Requirements

Installation involves cloning the repository (https://github.com/FunAudioLLM/Fun-ASR.git), navigating into the directory, and running pip install -r requirements.txt. GPU acceleration (e.g., cuda:0) is recommended for inference. Links to online demos are available via ModelScope and Huggingface Spaces.

Highlighted Details

Supports 31 languages, with extensive coverage of Chinese dialects (7) and regional accents (26).
Achieves up to 93% accuracy in far-field, high-noise environments.
Includes specialized modules for music background lyric recognition and rap speech.
Offers low-latency real-time transcription capabilities.

Maintenance & Community

Community interaction and online experiences are facilitated through ModelScope Community Space and Huggingface Spaces. The project is associated with Tongyi Lab.

Licensing & Compatibility

The provided README does not specify the software license. This omission requires clarification for commercial use or integration into closed-source projects.

Limitations & Caveats

The project has outstanding TODO items including support for returning timestamps, speaker diarization, and model training. The current focus is primarily on inference.

Health Check

Last Commit

4 days ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

120 stars in the last 30 days