Discover and explore top open-source AI tools and projects—updated daily.
FunAudioLLMAdvanced speech recognition toolkit for global audio
New!
Top 50.4% on SourcePulse
Fun-ASR is an end-to-end large speech recognition model from Tongyi Lab, designed for high-precision, multi-language transcription. It targets developers and researchers needing robust ASR capabilities, especially in challenging environments or for specialized domains, offering benefits like low-latency real-time transcription and extensive dialect/accent support.
How It Works
The system employs an end-to-end architecture trained on tens of millions of hours of real speech data. It features specialized optimizations for far-field, high-noise scenarios, achieving up to 93% accuracy. Novel aspects include deep support for 7 Chinese dialects and 26 regional accents, alongside recognition for 31 languages with mixed-language capabilities, and enhanced performance for music background lyric transcription.
Quick Start & Requirements
Installation involves cloning the repository (https://github.com/FunAudioLLM/Fun-ASR.git), navigating into the directory, and running pip install -r requirements.txt. GPU acceleration (e.g., cuda:0) is recommended for inference. Links to online demos are available via ModelScope and Huggingface Spaces.
Highlighted Details
Maintenance & Community
Community interaction and online experiences are facilitated through ModelScope Community Space and Huggingface Spaces. The project is associated with Tongyi Lab.
Licensing & Compatibility
The provided README does not specify the software license. This omission requires clarification for commercial use or integration into closed-source projects.
Limitations & Caveats
The project has outstanding TODO items including support for returning timestamps, speaker diarization, and model training. The current focus is primarily on inference.
3 days ago
Inactive
huggingface
janhq
espnet