tts  by wangwangit

AI platform for seamless voice and text processing

Created 5 months ago
334 stars

Top 82.2% on SourcePulse

GitHubView on GitHub
Project Summary

<2-3 sentences summarising what the project addresses and solves, the target audience, and the benefit.> VoiceCraft is an AI-driven, free, open-source platform for bidirectional voice processing, integrating Microsoft Edge TTS (20+ Chinese voices) and SiliconFlow STT. It targets users and developers needing efficient voice conversion, offering a responsive web UI and OpenAI-compatible APIs deployed via Cloudflare Workers for global, zero-configuration access.

How It Works

Deployed on Cloudflare Workers for edge computing, VoiceCraft uses Microsoft Edge TTS for high-quality speech synthesis with customizable parameters (speed, pitch, style) and the SiliconFlow API for accurate speech-to-text transcription. A unified web interface allows seamless switching between TTS and STT modes, complemented by RESTful APIs mimicking OpenAI's format for programmatic integration.

Quick Start & Requirements

  • Web Usage: Access https://tts.wangwangit.com.
  • Local Deployment: Clone repo, install Wrangler CLI (npm install -g wrangler), run wrangler dev.
  • Prerequisites: Node.js/npm for local deployment. Audio uploads limited to 10MB.
  • Links: Demo: https://tts.wangwangit.com.

Highlighted Details

  • Bidirectional Processing: Handles both Text-to-Speech and Speech-to-Text seamlessly.
  • OpenAI TTS API Compatibility: Offers a RESTful API mimicking OpenAI's format.
  • Serverless Edge Deployment: Cloudflare Workers ensure global distribution, high availability, and zero-configuration.
  • Extensive Chinese Voices: Over 20 distinct Chinese TTS voices with adjustable parameters.

Maintenance & Community

  • Contribution: Accepts Issues and Pull Requests.
  • Community: WeChat public account "一只会飞的旺旺" for updates and support. No Discord/Slack mentioned.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive MIT license suitable for commercial use and integration into closed-source projects.

Limitations & Caveats

  • STT Token: Programmatic STT may require a custom SiliconFlow API token; the web UI uses a default.
  • Audio Upload Limit: STT audio files are capped at 10MB.
  • External API Dependency: Relies on Microsoft Edge TTS and SiliconFlow, subject to their terms and availability.
Health Check
Last Commit

4 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
2
Star History
52 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.