vakyansh-models  by Open-Speech-EkStep

Open-source speech models for Indic languages

created 4 years ago
306 stars

Top 88.6% on sourcepulse

GitHubView on GitHub
Project Summary

This repository provides a comprehensive suite of open-source speech processing models, primarily focused on Automatic Speech Recognition (ASR) and Text-to-Speech (TTS) for Indic languages. It targets researchers and developers working with low-resource languages, offering pretrained and fine-tuned models, language models, and ancillary tools like punctuation and gender classification.

How It Works

The project leverages state-of-the-art architectures like Conformer and wav2vec2, trained on extensive datasets. Pretrained models, such as Vakyansh-Conformer-SSL (34,000 hours across 39 Indian languages) and CLSRIL-23 (10,000 hours across 23 Indic languages), serve as strong foundations. These are further fine-tuned for specific languages, with models like hindi_large_ssl_2500 and kannada_large_ssl_1000 demonstrating this approach. Language models, built using kenlm 5-gram, are provided to enhance ASR accuracy. TTS models utilize a Glow TTS and hifi GAN combination.

Quick Start & Requirements

  • Installation and usage details are not explicitly provided in the README.
  • Models are available for download, implying manual integration or use with a framework like Hugging Face Transformers.
  • Specific hardware requirements (e.g., GPU, CUDA) are not detailed but are likely necessary for efficient inference and fine-tuning.
  • Links to datasets and research papers are provided for context.

Highlighted Details

  • Extensive coverage of 39 Indian languages for ASR.
  • Pretrained Conformer-SSL model trained on 34,000 hours of diverse audio data.
  • CLSRIL-23 model offers cross-lingual speech representations for 23 Indic languages.
  • Includes TTS models for multiple Indic languages using Glow TTS and hifi GAN.

Maintenance & Community

  • The project is associated with AI4Bharat and IITM, indicating strong academic backing.
  • Citations for key models (Vakyansh, CLSRIL-23) are provided, suggesting active research.
  • No direct links to community channels (Discord, Slack) or a roadmap are present in the README.

Licensing & Compatibility

  • The README does not explicitly state a license. However, the association with AI4Bharat and the open-source nature suggest permissive licensing, likely Apache 2.0 or similar, but this requires verification.
  • Compatibility for commercial use is not specified.

Limitations & Caveats

  • Detailed installation and usage instructions are missing, requiring users to infer setup from model availability.
  • Specific hardware requirements for running the models are not listed.
  • The README does not mention any active development or community support channels.
Health Check
Last commit

2 years ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
6 stars in the last 90 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of AI Engineering, Designing Machine Learning Systems) and Lianmin Zheng Lianmin Zheng(Author of SGLang).

fish-speech by fishaudio

0.3%
23k
Open-source TTS for multilingual speech synthesis
created 1 year ago
updated 1 week ago
Feedback? Help us improve.