Kokoro-TTS-Local  by PierrunoYT

Local Text-to-Speech synthesis engine

Created 1 year ago
267 stars

Top 96.1% on SourcePulse

GitHubView on GitHub
Project Summary

Summary

Kokoro TTS Local offers a self-hosted, offline-capable implementation of the Kokoro-82M text-to-speech model. It targets developers and power users seeking a flexible, multi-voice TTS solution with both CLI and web interfaces, enabling local, private speech synthesis.

How It Works

This project leverages the Kokoro-82M model, featuring dynamic module loading and automatic dependency management for streamlined setup. It provides an interactive CLI and a Gradio-based web interface, allowing users to generate speech locally. Models and voices are downloaded on-demand from Hugging Face, with robust support for offline operation post-initialization.

Quick Start & Requirements

  • Prerequisites: Python 3.8+, FFmpeg (optional for MP3/AAC), CUDA-compatible GPU (optional for acceleration), Git.
  • Installation: Clone the repository, create a Python virtual environment, and run pip install -r requirements.txt. An alternative simplified installation involves pip install kokoro soundfile and system package installation for espeak-ng or espeak.
  • GPU Acceleration: Requires specific PyTorch installations tailored to CUDA versions (e.g., cu118, cu121, cu126, cu128). Verify with import torch; print(torch.cuda.is_available()).
  • Documentation: IMPROVEMENTS.md, OFFLINE_USAGE.md, CHINESE_TTS_GUIDE.md, README_CHINESE_TTS.md.

Highlighted Details

  • Extensive voice library: 54 voices across 8 languages (American English, British English, Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, Brazilian Portuguese).
  • Full offline mode support after initial internet-connected setup.
  • Enhanced security features, including secure model loading (weights_only=True), private Gradio interfaces (share=False), and comprehensive input validation.
  • Centralized configuration management via config.py and detailed dependency validation with dependency_checker.py.

Maintenance & Community

No specific details on maintainers, community channels (e.g., Discord, Slack), or active sponsorships are provided in the README. Contribution guidelines are present.

Licensing & Compatibility

Licensed under Apache 2.0. This license is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The project relies on optional external dependencies like FFmpeg and CUDA for full functionality (MP3/AAC output, GPU acceleration). Voice quality varies, indicated by provided "grades" (A-F). Offline mode requires a prior online run to download necessary assets.

Health Check
Last Commit

2 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
10 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.