Habibi-TTS  by SWivid

Open-source foundation for unified Arabic speech synthesis

Created 2 months ago
302 stars

Top 88.3% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Habibi-TTS provides an open-source foundation for unified-dialectal Arabic speech synthesis, addressing the need for high-quality TTS across various Arabic dialects. It targets researchers and developers, offering a flexible system for generating natural-sounding Arabic speech with support for multiple dialects and advanced evaluation metrics.

How It Works

The project leverages a unified model and specialized models for different Arabic dialects. It supports zero-shot TTS inference using reference audio and text, allowing users to specify dialects or infer them from prompts. The system is designed for both ease of use via a Gradio GUI and flexibility through a command-line interface, with advanced configuration options available via TOML files.

Quick Start & Requirements

  • Installation: pip install habibi-tts
  • Launch GUI: habibi-tts_infer-gradio
  • Prerequisites: PyTorch ecosystem (implied by accelerate launch), Python. Specific CUDA version not stated but recommended for performance.
  • Documentation: Detailed installation and inference guidance available in the F5-TTS documentation.
  • Links:

Highlighted Details

  • Supports unified and specialized models for diverse Arabic dialects (MSA, SAU, UAE, ALG, IRQ, EGY, MAR, OMN, TUN, LEV, SDN, LBY).
  • Includes comprehensive benchmarking tools for evaluating TTS performance, including Word Error Rate (WER) with ASR models (Meta Omnilingual-ASR-LLM-7B v1), speaker similarity (WavLM), and Mean Opinion Scores (MOS) via UTMOS.
  • Offers zero-shot inference capabilities, allowing generation from reference audio and text.

Maintenance & Community

No specific details on contributors, sponsorships, or community channels (like Discord/Slack) are provided in the README.

Licensing & Compatibility

  • Code: MIT License.
  • Models:
    • Unified, SAU, UAE models: CC-BY-NC-SA-4.0 (restricted by SADA and Mixat).
    • Specialized models (ALG, EGY, IRQ, MAR, MSA): Apache 2.0.
  • Compatibility: The CC-BY-NC-SA-4.0 license restricts commercial use and requires similar sharing for derivative works. Apache 2.0 is generally permissive for commercial use. Users must be aware of the dual licensing and restrictions on specific models.

Limitations & Caveats

The CC-BY-NC-SA-4.0 license on key models imposes non-commercial and share-alike restrictions, potentially limiting adoption in commercial products. Specific hardware requirements (e.g., GPU, CUDA version) are not explicitly detailed in the README, though implied for performance. The README points to external documentation (F5-TTS) for detailed installation, suggesting the provided README might be a high-level overview.

Health Check
Last Commit

1 week ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
15 stars in the last 30 days

Explore Similar Projects

Starred by Georgios Konstantopoulos Georgios Konstantopoulos(CTO, General Partner at Paradigm) and Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

GPT-SoVITS by RVC-Boss

0.4%
57k
Few-shot voice cloning and TTS web UI
Created 2 years ago
Updated 2 months ago
Feedback? Help us improve.