Discover and explore top open-source AI tools and projects—updated daily.
Speech models for Chinese ASR tasks
Top 33.3% on SourcePulse
This repository provides pre-trained wav2vec 2.0 and HuBERT models for Chinese speech. It targets researchers and developers working on Chinese Automatic Speech Recognition (ASR) and related speech processing tasks, offering significant improvements in character error rate (CER) compared to traditional FBank features.
How It Works
The project leverages the Fairseq toolkit to train wav2vec 2.0 and HuBERT models on 10,000 hours of diverse Chinese speech data from WenetSpeech. This self-supervised approach learns robust speech representations from unlabeled audio, which are then used as feature extractors for downstream ASR tasks. The pre-trained models can be integrated into ASR architectures like Conformer by summing hidden layer representations, replacing conventional acoustic features.
Quick Start & Requirements
fairseq
and transformers
Python packages.soundfile
. GPU with CUDA is recommended for inference.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
1 year ago
Inactive