SonicVale  by xcLee001

AI voice generation platform for diverse content

Created 3 months ago
337 stars

Top 81.8% on SourcePulse

GitHubView on GitHub
Project Summary

A multi-character, multi-emotion AI voice generation platform, SonicVale addresses the need for automated, expressive audio creation for content like novels, scripts, and videos. It targets content creators seeking to streamline multimedia production workflows by providing a tool for generating diverse and nuanced voiceovers.

How It Works

SonicVale utilizes an Electron, Vue, and Element Plus frontend paired with a FastAPI Python backend. It integrates with IndexTTS-2.0 based Text-to-Speech (TTS) services, supporting both cloud-native builds (with H20 GPU) and local deployments. A key feature is its compatibility with LLMs via the OpenAI API protocol, allowing for flexible text processing and voice generation. The platform's core approach involves automated dialogue splitting, character and emotion binding, and precise audio editing capabilities.

Quick Start & Requirements

Installation begins with cloning the repository. Users must download ffmpeg.exe and place it in the app/core/ffmpeg/ directory. The backend is set up by installing Python dependencies (pip install -r requirements.txt) and starting the server (uvicorn app.main:app --reload --port 8200). The frontend requires Node.js for dependency installation (npm install) and startup (npm run start). Essential prerequisites include Python, Node.js, and FFmpeg. A demonstration video is available on Bilibili. While detailed documentation is mentioned, a direct URL is not provided.

Highlighted Details

  • Supports multi-character and multi-emotion voice synthesis.
  • Automated dialogue splitting from imported text sources like novels and scripts.
  • Character library management with customizable emotion and voice binding.
  • Integration with customizable LLM interfaces and TTS configurations.
  • Advanced audio editing features for fine-tuning segments and adding silence.

Maintenance & Community

Bug reports and feature suggestions should be submitted via GitHub Issues. Community interaction is facilitated through QQ Group 1060711739 (verification: "音谷配音"). Specific details regarding core maintainers, sponsorships, or a public roadmap are not provided in the README.

Licensing & Compatibility

The project is licensed under the GNU Affero General Public License v3.0 (AGPL-3.0). This license mandates that any modifications or derivative works, particularly those offered as a network service (SaaS), must be made publicly available under the same AGPL-3.0 terms.

Limitations & Caveats

SonicVale is explicitly intended for learning and research purposes only. Users are strictly prohibited from engaging in illegal activities, including unauthorized voice cloning or infringing upon intellectual property rights. All associated risks and responsibilities are assumed by the user.

Health Check
Last Commit

3 weeks ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
56 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), Michael Han Michael Han(Cofounder of Unsloth), and
1 more.

Orpheus-TTS by canopyai

0.2%
6k
Open-source TTS for human-sounding speech, built on Llama-3b
Created 10 months ago
Updated 1 month ago
Feedback? Help us improve.