Discover and explore top open-source AI tools and projects—updated daily.
devnenLightweight, high-performance Text-to-Speech server
Top 99.6% on SourcePulse
This project provides a self-hostable, high-performance API server and Web UI for the lightweight KittenTTS text-to-speech models. It addresses the need for an efficient, realistic TTS solution that can run on diverse hardware, from powerful servers with NVIDIA GPUs to resource-constrained edge devices like the Raspberry Pi 5. The primary benefit is offering a user-friendly, production-ready TTS engine with enhanced features like GPU acceleration and large text processing for audiobooks, significantly improving upon the base KittenTTS model.
How It Works
The server leverages KittenTTS models, ranging from 15M to 80M parameters, running via ONNX for maximum portability. It utilizes a FastAPI backend and implements an optimized inference pipeline using onnxruntime-gpu and GPU I/O binding for NVIDIA GPUs, drastically reducing latency. For long texts, it intelligently splits them into manageable chunks, processes them sequentially, and seamlessly concatenates the resulting audio, making it suitable for audiobook generation. The approach prioritizes performance and efficiency, enabling real-time synthesis even on limited hardware.
Quick Start & Requirements
pip install -r requirements.txt for CPU, pip install -r requirements-nvidia.txt for NVIDIA GPU), and run python server.py. Docker Compose (docker compose up -d --build or docker compose -f docker-compose-cpu.yml up -d --build) is also recommended for easier deployment.onnxruntime-gpu, and PyTorch with CUDA 12.1. Linux/RPi requires libsndfile1 and ffmpeg.https://github.com/devnen/Kitten-TTS-Server.git, API Docs: http://localhost:8005/docs (after running).Highlighted Details
/tts endpoint for full control and an OpenAI-compatible /v1/audio/speech endpoint for easy integration.Maintenance & Community
The README does not detail specific maintainers, sponsorships, or community channels like Discord or Slack. Contributions are welcomed via GitHub issues and pull requests.
Licensing & Compatibility
Limitations & Caveats
GPU acceleration is strictly limited to NVIDIA hardware with CUDA. The installation of eSpeak NG can be a common point of failure if not performed correctly, particularly on Windows. Compilation of certain Python packages during installation on ARM architectures (like Raspberry Pi) may take a considerable amount of time (15-30 minutes).
1 week ago
Inactive
neonbjb