piper by rhasspy

Local neural text-to-speech system

Created 3 years ago

10,427 stars

Top 4.9% on SourcePulse

View on GitHub

1 Expert Loves This Project

Tim J. Baek

Founder of Open WebUI

Project Summary

Piper is a fast, local neural text-to-speech system optimized for edge devices like the Raspberry Pi 4, offering high-quality voice synthesis without cloud dependencies. It is designed for users and developers seeking efficient, private speech output for applications such as smart home assistants, accessibility tools, and embedded systems.

How It Works

Piper utilizes VITS (Variational Inference with adversarial learning for end-to-end Text-to-Speech) models exported to ONNX format for efficient inference. This approach allows for rapid, on-device processing and broad hardware compatibility. The system supports multi-speaker models and offers streaming audio output for real-time applications.

Quick Start & Requirements

Install: pip install piper-tts (Python) or download binary releases.
Prerequisites: For GPU acceleration, install onnxruntime-gpu and ensure a CUDA environment is set up.
Usage: Download voice models (.onnx and .onnx.json files). Run via command line: echo 'Hello' | ./piper --model en_US-lessac-medium.onnx --output_file hello.wav.
Resources: Voice models require disk space. CPU inference is optimized for Raspberry Pi 4.
Docs: https://github.com/rhasspy/piper

Highlighted Details

Supports over 40 languages with downloadable voice models.
Optimized for Raspberry Pi 4 and other ARM devices.
Offers JSON input for structured TTS requests.
Can stream raw audio output for low-latency playback.

Maintenance & Community

Piper is actively developed by Rhasspy and has been integrated into projects like Home Assistant and NVDA. Community support channels are available via Discord/Slack.

Licensing & Compatibility

Piper itself is typically licensed under permissive terms (e.g., MIT), but voice models come with their own licenses, which must be reviewed for commercial use or redistribution.

Limitations & Caveats

Voice model quality can vary significantly between languages and specific models. Building from source requires downloading and extracting piper-phonemize to a specific directory structure. GPU support requires a compatible NVIDIA setup.

Health Check

Last Commit

4 months ago

Responsiveness

1 week

Pull Requests (30d)

Issues (30d)

Star History

131 stars in the last 30 days