Discover and explore top open-source AI tools and projects—updated daily.
FunAudioLLMAdvanced speech recognition toolkit for global audio
Top 40.4% on SourcePulse
Fun-ASR is an end-to-end large speech recognition model from Tongyi Lab, designed for high-precision, multi-language transcription. It targets developers and researchers needing robust ASR capabilities, especially in challenging environments or for specialized domains, offering benefits like low-latency real-time transcription and extensive dialect/accent support.
How It Works
The system employs an end-to-end architecture trained on tens of millions of hours of real speech data. It features specialized optimizations for far-field, high-noise scenarios, achieving up to 93% accuracy. Novel aspects include deep support for 7 Chinese dialects and 26 regional accents, alongside recognition for 31 languages with mixed-language capabilities, and enhanced performance for music background lyric transcription.
Quick Start & Requirements
Installation involves cloning the repository (https://github.com/FunAudioLLM/Fun-ASR.git), navigating into the directory, and running pip install -r requirements.txt. GPU acceleration (e.g., cuda:0) is recommended for inference. Links to online demos are available via ModelScope and Huggingface Spaces.
Highlighted Details
Maintenance & Community
Community interaction and online experiences are facilitated through ModelScope Community Space and Huggingface Spaces. The project is associated with Tongyi Lab.
Licensing & Compatibility
The provided README does not specify the software license. This omission requires clarification for commercial use or integration into closed-source projects.
Limitations & Caveats
The project has outstanding TODO items including support for returning timestamps, speaker diarization, and model training. The current focus is primarily on inference.
23 hours ago
Inactive
huggingface
janhq
espnet