Dolphin by DataoceanAI

Multilingual ASR model for diverse Eastern languages

Created 11 months ago

696 stars

Top 49.0% on SourcePulse

Project Summary

A multilingual, multitask Automatic Speech Recognition (ASR) model developed by Dataocean AI and Tsinghua University, Dolphin addresses the need for accurate speech processing across diverse Eastern languages and Chinese dialects. It offers benefits in applications requiring robust speech-to-text capabilities, voice activity detection, segmentation, and language identification, targeting researchers and developers working with non-English audio data.

How It Works

Dolphin employs a joint CTC-Attention architecture, featuring an E-Branchformer-based encoder and a standard Transformer decoder, building upon established models like Whisper and OWSM. A key innovation is its two-level language token system, which distinguishes between language (e.g., <zh>) and region (e.g., <CN>), enabling finer-grained linguistic and regional diversity handling, particularly beneficial for extensive datasets.

Quick Start & Requirements

Installation: pip install -U dataoceanai-dolphin or from source: pip install git+https://github.com/SpeechOceanTech/Dolphin.git.
Prerequisites: FFmpeg is required for audio conversion to WAV format.
Dependencies: Supports CUDA, Apple MPS, Huawei Ascend NPU (requires specific torch_npu and CANN versions), and CPU.
Resources: Model sizes range from 140M (base) to 1679M (large) parameters.
Docs: Links to Paper, Huggingface, Modelscope, Openi, Wisemodel are mentioned.

Highlighted Details

Supports 40 Eastern languages and 22 Chinese dialects.
Performs Automatic Speech Recognition (ASR), Voice Activity Detection (VAD), segmentation, and Language Identification (LID).
Available models include 'base' (140M, 33.3% WER) and 'small' (372M, 25.2% WER).
Trained on over 210,000 hours of proprietary and open-source data.

Maintenance & Community

No specific details on contributors, sponsorships, or community channels are present in the provided README text.

Licensing & Compatibility

License: Apache 2.0 License.
Compatibility: The Apache 2.0 license generally permits commercial use and integration into closed-source projects.

Limitations & Caveats

Dolphin does not support translation tasks and omits the use of previous text and its related tokens. The medium and large models are not yet publicly available.

Dolphin by DataoceanAI

Explore Similar Projects

OSUM by ASLP-lab

LLaSM by LinkSoul-AI

SenseVoice.cpp by lovemefan

RapidASR by RapidAI

pyctcdecode by kensho-technologies

vits-simple-api by Artrajz

ichigo by janhq

pororo by kakaobrain

omnilingual-asr by facebookresearch

seamless_communication by facebookresearch

FunASR by modelscope

unilm by microsoft