Kokoro-TTS-Local by PierrunoYT

Local Text-to-Speech synthesis engine

Created 1 year ago

312 stars

Top 86.2% on SourcePulse

Project Summary

Summary

Kokoro TTS Local offers a self-hosted, offline-capable implementation of the Kokoro-82M text-to-speech model. It targets developers and power users seeking a flexible, multi-voice TTS solution with both CLI and web interfaces, enabling local, private speech synthesis.

How It Works

This project leverages the Kokoro-82M model, featuring dynamic module loading and automatic dependency management for streamlined setup. It provides an interactive CLI and a Gradio-based web interface, allowing users to generate speech locally. Models and voices are downloaded on-demand from Hugging Face, with robust support for offline operation post-initialization.

Quick Start & Requirements

Prerequisites: Python 3.8+, FFmpeg (optional for MP3/AAC), CUDA-compatible GPU (optional for acceleration), Git.
Installation: Clone the repository, create a Python virtual environment, and run pip install -r requirements.txt. An alternative simplified installation involves pip install kokoro soundfile and system package installation for espeak-ng or espeak.
GPU Acceleration: Requires specific PyTorch installations tailored to CUDA versions (e.g., cu118, cu121, cu126, cu128). Verify with import torch; print(torch.cuda.is_available()).
Documentation: IMPROVEMENTS.md, OFFLINE_USAGE.md, CHINESE_TTS_GUIDE.md, README_CHINESE_TTS.md.

Highlighted Details

Extensive voice library: 54 voices across 8 languages (American English, British English, Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, Brazilian Portuguese).
Full offline mode support after initial internet-connected setup.
Enhanced security features, including secure model loading (weights_only=True), private Gradio interfaces (share=False), and comprehensive input validation.
Centralized configuration management via config.py and detailed dependency validation with dependency_checker.py.

Maintenance & Community

No specific details on maintainers, community channels (e.g., Discord, Slack), or active sponsorships are provided in the README. Contribution guidelines are present.

Licensing & Compatibility

Licensed under Apache 2.0. This license is generally permissive for commercial use and integration into closed-source projects.

Limitations & Caveats

The project relies on optional external dependencies like FFmpeg and CUDA for full functionality (MP3/AAC output, GPU acceleration). Voice quality varies, indicated by provided "grades" (A-F). Offline mode requires a prior online run to download necessary assets.

Kokoro-TTS-Local by PierrunoYT

Explore Similar Projects

VoiceSculptor by ASLP-lab

CloneTTS by sipeter

orate by haydenbleasel

echogarden by echogarden-project

Auralis by astramind-ai

june by mezbaul-h

Open-VoiceCanvas by ItusiAI

kokoro-web by eduardolat

ichigo by janhq

alltalk_tts by erew123

elevenlabs-python by elevenlabs

WhisperLiveKit by QuentinFuxa