Habibi-TTS by SWivid

Open-source foundation for unified Arabic speech synthesis

Created 5 months ago

330 stars

Top 82.6% on SourcePulse

Project Summary

Summary

Habibi-TTS provides an open-source foundation for unified-dialectal Arabic speech synthesis, addressing the need for high-quality TTS across various Arabic dialects. It targets researchers and developers, offering a flexible system for generating natural-sounding Arabic speech with support for multiple dialects and advanced evaluation metrics.

How It Works

The project leverages a unified model and specialized models for different Arabic dialects. It supports zero-shot TTS inference using reference audio and text, allowing users to specify dialects or infer them from prompts. The system is designed for both ease of use via a Gradio GUI and flexibility through a command-line interface, with advanced configuration options available via TOML files.

Quick Start & Requirements

Installation: pip install habibi-tts
Launch GUI: habibi-tts_infer-gradio
Prerequisites: PyTorch ecosystem (implied by accelerate launch), Python. Specific CUDA version not stated but recommended for performance.
Documentation: Detailed installation and inference guidance available in the F5-TTS documentation.
Links:
- Training/Finetuning: https://github.com/SWivid/Habibi-TTS/issues/2

Highlighted Details

Supports unified and specialized models for diverse Arabic dialects (MSA, SAU, UAE, ALG, IRQ, EGY, MAR, OMN, TUN, LEV, SDN, LBY).
Includes comprehensive benchmarking tools for evaluating TTS performance, including Word Error Rate (WER) with ASR models (Meta Omnilingual-ASR-LLM-7B v1), speaker similarity (WavLM), and Mean Opinion Scores (MOS) via UTMOS.
Offers zero-shot inference capabilities, allowing generation from reference audio and text.

Maintenance & Community

No specific details on contributors, sponsorships, or community channels (like Discord/Slack) are provided in the README.

Licensing & Compatibility

Code: MIT License.
Models:
- Unified, SAU, UAE models: CC-BY-NC-SA-4.0 (restricted by SADA and Mixat).
- Specialized models (ALG, EGY, IRQ, MAR, MSA): Apache 2.0.
Compatibility: The CC-BY-NC-SA-4.0 license restricts commercial use and requires similar sharing for derivative works. Apache 2.0 is generally permissive for commercial use. Users must be aware of the dual licensing and restrictions on specific models.

Limitations & Caveats

The CC-BY-NC-SA-4.0 license on key models imposes non-commercial and share-alike restrictions, potentially limiting adoption in commercial products. Specific hardware requirements (e.g., GPU, CUDA version) are not explicitly detailed in the README, though implied for performance. The README points to external documentation (F5-TTS) for detailed installation, suggesting the provided README might be a high-level overview.

Health Check

Last Commit

3 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

7 stars in the last 30 days