Genie-TTS by High-Logic

CPU-optimized TTS inference engine

Created 4 months ago

1,236 stars

Top 31.8% on SourcePulse

Project Summary

Genie is a lightweight inference engine and model converter for the GPT-SoVITS speech synthesis project. It targets users who need efficient, CPU-based speech synthesis with low latency and a small runtime footprint, offering a convenient API server and model conversion tools.

How It Works

Genie optimizes the GPT-SoVITS V2 model for CPU inference, achieving significantly lower latency and a much smaller runtime size compared to the official PyTorch or ONNX models. This is accomplished through ONNX model conversion and specific optimizations tailored for CPU performance, making it suitable for applications where GPU resources are limited or not cost-effective.

Quick Start & Requirements

Installation: pip install genie-tts
Prerequisites: Python >= 3.9. Windows users may need Visual Studio Build Tools with the "Desktop development with C++" workload for pyopenjtalk installation.
Quick Tryout: Includes predefined characters for immediate use without requiring model files.
Documentation: Demo Video, API Server Tutorial

Highlighted Details

Achieves 1.13s first inference latency on CPU (i7-13620H), outperforming official PyTorch (1.35s) and ONNX (3.57s) models.
Runtime size is approximately 200MB, with model sizes around 230MB, significantly smaller than the multi-GB official PyTorch models.
Supports GPT-SoVITS V2 models and Japanese language.
Includes tools for ONNX model conversion and a FastAPI server for API access.

Maintenance & Community

The project is actively maintained by High-Logic.
Roadmap includes support for more languages (Chinese, English), future GPT-SoVITS versions (V2Proplus, V3, V4), and easier deployment options like Docker images.

Licensing & Compatibility

The README does not explicitly state the license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

Currently supports only GPT-SoVITS V2 models and Japanese language.
The project is focused on CPU inference, with no explicit mention of GPU acceleration benefits.
Installation of pyopenjtalk may require C++ build tools on Windows.

Health Check

Last Commit

2 weeks ago

Responsiveness

Inactive

Pull Requests (30d)

2

Issues (30d)

16

Star History

193 stars in the last 30 days

Explore Similar Projects

survey by tts-tutorial

Survey paper on neural speech synthesis

Created 4 years ago

Updated 4 years ago

LunaVox by Lux-Luna

Fast, lightweight speech synthesis for CPU

Created 3 months ago

Updated 1 week ago

willow-inference-server by toverainc

Local inference server for ASR/STT, TTS, and LLM tasks

Created 2 years ago

Updated 6 months ago

VITA-Audio by VITA-MLLM

Speech model for fast audio-text token generation

Created 8 months ago

Updated 7 months ago

Matcha-TTS by shivammehta25

TTS architecture research paper using conditional flow matching

Created 2 years ago

Updated 1 month ago

supertonic by supertone-inc

Lightning-fast, on-device Text-to-Speech (TTS)

Created 1 month ago

Updated 5 days ago

Starred by

Jonathan Ragan-Kelley

Jonathan Ragan-Kelley(Professor at MIT),

Junyang Lin

Junyang Lin(Core Maintainer at Alibaba Qwen), and

2 more.

mlx-audio by Blaizzy

TTS/STT/STS library for efficient speech analysis on Apple Silicon

Created 1 year ago

Updated 3 days ago

Starred by

Tim J. Baek

Tim J. Baek(Founder of Open WebUI).

IMS-Toucan by DigitalPhonetics

TTS toolkit for 7000+ languages

Created 4 years ago

Updated 6 months ago

Starred by

Bryan Helmig

Bryan Helmig(Cofounder of Zapier),

Elvis Saravia

Elvis Saravia(Founder of DAIR.AI), and

2 more.

KittenTTS by KittenML

Realistic text-to-speech model under 25MB

Created 5 months ago

Updated 4 months ago

FastSpeech2 by ming024

PyTorch implementation of FastSpeech 2 for text-to-speech

Created 5 years ago

Updated 2 years ago

Starred by

Benjamin Bolte

Benjamin Bolte(Cofounder of K-Scale Labs),

Travis Fischer

Travis Fischer(Founder of Agentic), and

10 more.

tortoise-tts by neonbjb

Multi-voice TTS system emphasizing quality, realistic prosody

Created 4 years ago

Updated 1 year ago

Starred by

Jason Huggins

Jason Huggins(Creator of Selenium),

Michael Han

Michael Han(Cofounder of Unsloth), and

11 more.

TTS by coqui-ai

Deep learning toolkit for Text-to-Speech, research-tested

Created 5 years ago

Updated 1 year ago

Feedback? Help us improve.