FunASR by modelscope

Speech recognition toolkit for bridging research and industrial applications

Created 3 years ago

14,413 stars

Top 3.4% on SourcePulse

Project Summary

FunASR is a comprehensive, end-to-end speech recognition toolkit designed for both academic research and industrial applications. It provides a unified platform for speech recognition (ASR), voice activity detection (VAD), punctuation restoration, and other speech processing tasks, enabling researchers and developers to build and deploy ASR systems efficiently.

How It Works

FunASR leverages a modular architecture, allowing users to combine various pre-trained models for different speech tasks. It supports both non-streaming and streaming inference, utilizing models like Paraformer (a parallel Transformer) for high accuracy and efficiency. The toolkit facilitates fine-tuning on custom datasets and offers robust deployment options, including real-time and file-based transcription services.

Quick Start & Requirements

Install: pip3 install -U funasr
Prerequisites: Python >= 3.8, PyTorch >= 1.13, torchaudio. Optional: modelscope, huggingface_hub.
Resources: GPU recommended for optimal performance.
Docs: Tutorial, Demo Examples

Highlighted Details

Supports a wide range of speech tasks: ASR, VAD, Punctuation Restoration, Language Models, Speaker Verification, Speaker Diarization, Speech Emotion Recognition, and Keyword Spotting.
Offers a vast Model Zoo with numerous pre-trained models on ModelScope and Hugging Face, including SenseVoiceSmall, Paraformer variants, Whisper-large-v3, and Qwen-Audio.
Provides both non-streaming and streaming inference capabilities with configurable latency parameters.
Includes services for offline file transcription (CPU/GPU) and real-time transcription.

Maintenance & Community

Active development with frequent updates (e.g., new model support, service releases).
Community support via GitHub Issues and a DingTalk group.
Key contributors include researchers from Alibaba DAMO Academy.

Licensing & Compatibility

License: MIT License for the toolkit. Pre-trained models are subject to their own Model License Agreement.
Compatibility: Generally compatible with commercial use, but model-specific licenses should be reviewed.

Limitations & Caveats

Some advanced features or specific model deployments might still be in progress or have limited documentation in English.
Performance and resource requirements can vary significantly based on the chosen model and task.

Health Check

Last Commit

4 days ago

Responsiveness

1 week

Pull Requests (30d)

10

Issues (30d)

33

Star History

637 stars in the last 30 days

Explore Similar Projects

ASR-TTS-paper-daily by halsay

Daily AI paper updates for ASR and TTS research

Created 1 year ago

Updated 21 hours ago

deepspeech-german by AASHISHAG

ASR module using Mozilla DeepSpeech for German speech

Created 6 years ago

Updated 2 years ago

INTERSPEECH-2023-24-Papers by DmitryRyumin

Collection of speech processing research papers from INTERSPEECH conferences

Created 2 years ago

Updated 1 year ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind) and

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral).

huggingsound by jonatasgrosman

Speech toolkit for speech-related tasks based on Hugging Face's tools

Created 3 years ago

Updated 2 years ago

RapidASR by RapidAI

Open-source library for automatic speech recognition

Created 4 years ago

Updated 1 year ago

Starred by

Alexander Borzunov

Alexander Borzunov(Research Scientist at OpenAI).

speech_course by yandexdataschool

Speech processing course materials

Created 4 years ago

Updated 5 months ago

athena by athena-team

Open-source speech processing engine for industrial/academic use

Created 6 years ago

Updated 3 years ago

Starred by

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind).

SenseVoice by FunAudioLLM

Multilingual speech model for understanding voice

Created 1 year ago

Updated 1 week ago

sherpa-onnx by k2-fsa

Speech toolkit for local, offline speech AI tasks via ONNX

Created 3 years ago

Updated 16 hours ago

Starred by

Chip Huyen

Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"),

Piotr Dąbkowski

Piotr Dąbkowski(Cofounder of ElevenLabs), and

2 more.

PaddleSpeech by PaddlePaddle

Speech toolkit for ASR, TTS, speaker verification, translation, and keyword spotting

Created 8 years ago

Updated 2 months ago

Starred by

Patrick von Platen

Patrick von Platen(Author of Hugging Face Diffusers; Research Engineer at Mistral),

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs), and

3 more.

espnet by espnet

End-to-end speech processing toolkit for various speech tasks

Created 8 years ago

Updated 3 weeks ago

Starred by

Boris Cherny

Boris Cherny(Creator of Claude Code; MTS at Anthropic),

Andrej Karpathy

Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and

43 more.

whisper by openai

Speech recognition model for multilingual transcription/translation

Created 3 years ago

Updated 3 weeks ago

Feedback? Help us improve.