This repository provides a comprehensive suite of open-source speech processing models, primarily focused on Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) for Indic languages. It targets researchers and developers working with low-resource languages, offering pretrained and fine-tuned models, language models, and ancillary tools like punctuation and gender classification.
How It Works
The project leverages state-of-the-art architectures like Conformer and wav2vec2, trained on extensive datasets. Pretrained models, such as Vakyansh-Conformer-SSL (34,000 hours across 39 Indian languages) and CLSRIL-23 (10,000 hours across 23 Indic languages), serve as strong foundations. These are further fine-tuned for specific languages, with models like hindi_large_ssl_2500
and kannada_large_ssl_1000
demonstrating this approach. Language models, built using kenlm 5-gram, are provided to enhance ASR accuracy. TTS models utilize a Glow TTS and hifi GAN combination.
Quick Start & Requirements
- Installation and usage details are not explicitly provided in the README.
- Models are available for download, implying manual integration or use with a framework like Hugging Face Transformers.
- Specific hardware requirements (e.g., GPU, CUDA) are not detailed but are likely necessary for efficient inference and fine-tuning.
- Links to datasets and research papers are provided for context.
Highlighted Details
- Extensive coverage of 39 Indian languages for ASR.
- Pretrained Conformer-SSL model trained on 34,000 hours of diverse audio data.
- CLSRIL-23 model offers cross-lingual speech representations for 23 Indic languages.
- Includes TTS models for multiple Indic languages using Glow TTS and hifi GAN.
Maintenance & Community
- The project is associated with AI4Bharat and IITM, indicating strong academic backing.
- Citations for key models (Vakyansh, CLSRIL-23) are provided, suggesting active research.
- No direct links to community channels (Discord, Slack) or a roadmap are present in the README.
Licensing & Compatibility
- The README does not explicitly state a license. However, the association with AI4Bharat and the open-source nature suggest permissive licensing, likely Apache 2.0 or similar, but this requires verification.
- Compatibility for commercial use is not specified.
Limitations & Caveats
- Detailed installation and usage instructions are missing, requiring users to infer setup from model availability.
- Specific hardware requirements for running the models are not listed.
- The README does not mention any active development or community support channels.