vakyansh-models by Open-Speech-EkStep

Open-source speech models for Indic languages

Created 5 years ago

324 stars

Top 84.3% on SourcePulse

2 Experts Love This Project

taranjeet

Taranjeet Singh

Cofounder of Mem0

osanseviero

Omar Sanseviero

DevRel at Google DeepMind

Project Summary

This repository provides a comprehensive suite of open-source speech processing models, primarily focused on Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) for Indic languages. It targets researchers and developers working with low-resource languages, offering pretrained and fine-tuned models, language models, and ancillary tools like punctuation and gender classification.

How It Works

The project leverages state-of-the-art architectures like Conformer and wav2vec2, trained on extensive datasets. Pretrained models, such as Vakyansh-Conformer-SSL (34,000 hours across 39 Indian languages) and CLSRIL-23 (10,000 hours across 23 Indic languages), serve as strong foundations. These are further fine-tuned for specific languages, with models like hindi_large_ssl_2500 and kannada_large_ssl_1000 demonstrating this approach. Language models, built using kenlm 5-gram, are provided to enhance ASR accuracy. TTS models utilize a Glow TTS and hifi GAN combination.

Quick Start & Requirements

Installation and usage details are not explicitly provided in the README.
Models are available for download, implying manual integration or use with a framework like Hugging Face Transformers.
Specific hardware requirements (e.g., GPU, CUDA) are not detailed but are likely necessary for efficient inference and fine-tuning.
Links to datasets and research papers are provided for context.

Highlighted Details

Extensive coverage of 39 Indian languages for ASR.
Pretrained Conformer-SSL model trained on 34,000 hours of diverse audio data.
CLSRIL-23 model offers cross-lingual speech representations for 23 Indic languages.
Includes TTS models for multiple Indic languages using Glow TTS and hifi GAN.

Maintenance & Community

The project is associated with AI4Bharat and IITM, indicating strong academic backing.
Citations for key models (Vakyansh, CLSRIL-23) are provided, suggesting active research.
No direct links to community channels (Discord, Slack) or a roadmap are present in the README.

Licensing & Compatibility

The README does not explicitly state a license. However, the association with AI4Bharat and the open-source nature suggest permissive licensing, likely Apache 2.0 or similar, but this requires verification.
Compatibility for commercial use is not specified.

Limitations & Caveats

Detailed installation and usage instructions are missing, requiring users to infer setup from model availability.
Specific hardware requirements for running the models are not listed.
The README does not mention any active development or community support channels.

Health Check

Last Commit

3 years ago

Responsiveness

Inactive

Pull Requests (30d)

0

Issues (30d)

0

Star History

7 stars in the last 30 days

Explore Similar Projects

speech-recognition-uk by egorsmkv

Resource collection for Ukrainian speech AI

Created 5 years ago

Updated 5 months ago

deepspeech-german by AASHISHAG

ASR module using Mozilla DeepSpeech for German speech

Created 6 years ago

Updated 3 years ago

Indic-TTS by AI4Bharat

TTS models for Indic languages

Created 3 years ago

Updated 1 year ago

zamia-speech by gooofy

Speech tools/data for cloudless ASR, plus TTS voice training

Created 9 years ago

Updated 4 years ago

chinese_speech_pretrain by TencentGameMate

Speech models for Chinese ASR tasks

Created 3 years ago

Updated 1 year ago

parrots by shibing624

ASR/TTS toolkit for multilingual speech processing

Created 7 years ago

Updated 3 months ago

Qwen3-ASR by QwenLM

Advanced multilingual speech recognition and alignment

Created 4 weeks ago

Updated 3 weeks ago

Starred by

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI),

Omar Sanseviero

Omar Sanseviero(DevRel at Google DeepMind), and

2 more.

large_concept_model by facebookresearch

Language modeling research paper in a sentence representation space

Created 1 year ago

Updated 1 year ago

NLP-Tutorials by MorvanZhou

NLP tutorial with simple implementations of models

Created 7 years ago

Updated 2 years ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai),

Luis Capelo

Luis Capelo(Cofounder of Lightning AI), and

1 more.

silero-models by snakers4

Pre-trained STT/TTS/text-enhancement models made simple

Created 5 years ago

Updated 3 weeks ago

Starred by

Chaoyu Yang

Chaoyu Yang(Founder of Bento),

Tim J. Baek

Tim J. Baek(Founder of Open WebUI), and

7 more.

seamless_communication by facebookresearch

Multilingual speech and text translation models for natural communication

Created 2 years ago

Updated 1 year ago

Starred by

Jeremy Howard

Jeremy Howard(Cofounder of fast.ai),

Alex Cheema

Alex Cheema(Cofounder of EXO Labs), and

22 more.

unilm by microsoft

Foundation models for language, vision, speech, and multimodal tasks

Created 6 years ago

Updated 1 month ago

Feedback? Help us improve.