easyVoice  by cosin2077

Text-to-speech tool for long texts and multi-character dubbing

created 4 months ago
1,245 stars

Top 32.3% on sourcepulse

GitHubView on GitHub
Project Summary

EasyVoice is an open-source text-to-speech (TTS) solution designed for converting long text, such as novels, into high-quality audiobooks with support for multi-character narration. It targets users who need to generate audio content from extensive text, offering features like streaming playback, automatic subtitle generation, and AI-driven voice recommendations for different characters.

How It Works

The system leverages Microsoft Azure TTS (Edge-TTS API) and OpenAI-compatible TTS services for speech synthesis. It supports streaming to handle arbitrarily long texts and allows for custom voice parameters like rate, volume, and pitch. An AI component analyzes text segments to recommend suitable voices and configurations, enabling multi-character narration. The architecture comprises a Vue 3 frontend and a Node.js backend.

Quick Start & Requirements

  • Docker: docker run -d -p 3000:3000 -v $(pwd)/audio:/app/audio cosincox/easyvoice:latest
  • Local: Requires Node.js and pnpm. Clone the repo, run pnpm i -r, then pnpm dev:root (development) or pnpm build:root and pnpm start:root (production).
  • Audio Output: Saved to a mounted audio directory (Docker) or ./packages/backend/audio (local).
  • Docs: easyvoice.ioplus.tech

Highlighted Details

  • Supports converting over 100,000 characters of text into audiobooks.
  • Features AI-powered recommendations for voice configurations and multi-character dubbing.
  • Provides automatic subtitle generation alongside audio output.
  • Offers streaming playback for immediate audio testing and listening.

Maintenance & Community

The project is actively maintained by cosin2077. Future plans include integrating official TTS APIs, Google TTS, and voice cloning.

Licensing & Compatibility

The project is released under the MIT License, permitting commercial use and integration with closed-source projects.

Limitations & Caveats

AI-recommended voice quality is dependent on the underlying large language model's capabilities. The AI recommendation process can be slower than direct TTS generation. Rate limiting and concurrency limits for the Edge-TTS API may apply.

Health Check
Last commit

2 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
5
Star History
535 stars in the last 90 days

Explore Similar Projects

Starred by Dan Guido Dan Guido(Cofounder of Trail of Bits), Joe Walnes Joe Walnes(Head of Experimental Projects at Stripe), and
1 more.

chatterbox by resemble-ai

1.6%
10k
Open-source TTS model
created 3 months ago
updated 1 day ago
Feedback? Help us improve.