Discover and explore top open-source AI tools and projects—updated daily.
DataoceanAIMultilingual ASR model for diverse Eastern languages
Top 49.4% on SourcePulse
A multilingual, multitask Automatic Speech Recognition (ASR) model developed by Dataocean AI and Tsinghua University, Dolphin addresses the need for accurate speech processing across diverse Eastern languages and Chinese dialects. It offers benefits in applications requiring robust speech-to-text capabilities, voice activity detection, segmentation, and language identification, targeting researchers and developers working with non-English audio data.
How It Works
Dolphin employs a joint CTC-Attention architecture, featuring an E-Branchformer-based encoder and a standard Transformer decoder, building upon established models like Whisper and OWSM. A key innovation is its two-level language token system, which distinguishes between language (e.g., <zh>) and region (e.g., <CN>), enabling finer-grained linguistic and regional diversity handling, particularly beneficial for extensive datasets.
Quick Start & Requirements
pip install -U dataoceanai-dolphin or from source: pip install git+https://github.com/SpeechOceanTech/Dolphin.git.Highlighted Details
Maintenance & Community
No specific details on contributors, sponsorships, or community channels are present in the provided README text.
Licensing & Compatibility
Limitations & Caveats
Dolphin does not support translation tasks and omits the use of previous text and its related tokens. The medium and large models are not yet publicly available.
1 month ago
Inactive
kensho-technologies
janhq