easyVoice by cosin2077

Text-to-speech tool for long texts and multi-character dubbing

Created 11 months ago

1,948 stars

Top 22.1% on SourcePulse

Project Summary

EasyVoice is an open-source text-to-speech (TTS) solution designed for converting long text, such as novels, into high-quality audiobooks with support for multi-character narration. It targets users who need to generate audio content from extensive text, offering features like streaming playback, automatic subtitle generation, and AI-driven voice recommendations for different characters.

How It Works

The system leverages Microsoft Azure TTS (Edge-TTS API) and OpenAI-compatible TTS services for speech synthesis. It supports streaming to handle arbitrarily long texts and allows for custom voice parameters like rate, volume, and pitch. An AI component analyzes text segments to recommend suitable voices and configurations, enabling multi-character narration. The architecture comprises a Vue 3 frontend and a Node.js backend.

Quick Start & Requirements

Docker: docker run -d -p 3000:3000 -v $(pwd)/audio:/app/audio cosincox/easyvoice:latest
Local: Requires Node.js and pnpm. Clone the repo, run pnpm i -r, then pnpm dev:root (development) or pnpm build:root and pnpm start:root (production).
Audio Output: Saved to a mounted audio directory (Docker) or ./packages/backend/audio (local).
Docs: easyvoice.ioplus.tech

Highlighted Details

Supports converting over 100,000 characters of text into audiobooks.
Features AI-powered recommendations for voice configurations and multi-character dubbing.
Provides automatic subtitle generation alongside audio output.
Offers streaming playback for immediate audio testing and listening.

Maintenance & Community

The project is actively maintained by cosin2077. Future plans include integrating official TTS APIs, Google TTS, and voice cloning.

Licensing & Compatibility

The project is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

AI-recommended voice quality is dependent on the underlying large language model's capabilities. The AI recommendation process can be slower than direct TTS generation. Rate limiting and concurrency limits for the Edge-TTS API may apply.

easyVoice by cosin2077

Explore Similar Projects

praises by ElmTran

curses by mmpneo

SonicVale by xcLee001

FluidVoice by altic-dev

Open-VoiceCanvas by ItusiAI

ChatWaifu by cjyaddone

Easy-Voice-Toolkit by Spr-Aachen

tts by zuoban

LanguageLeapAI by SociallyIneptWeeb

AI-Waifu-Vtuber by ardha27

Orpheus-TTS by canopyai

voice-pro by abus-aikorea