Discover and explore top open-source AI tools and projects—updated daily.
KoljaBVideo voice transformation and translation CLI
Top 98.5% on SourcePulse
This command-line tool addresses the need for advanced voice transformation and translation in video content. It targets engineers, researchers, and power users seeking to modify audio tracks in videos, offering capabilities like voice cloning, multilingual translation, and stylistic voice modulation. The primary benefit is enabling cost-effective, high-control audio manipulation for video production and accessibility.
How It Works
TurnVoice processes local video files or YouTube URLs, leveraging Coqui TTS for free voice transformation and cloning, while also supporting commercial TTS engines like Elevenlabs, OpenAI, and Azure for expanded voice options. It integrates deep-translator for zero-cost video translation and allows users to apply custom speaking styles via prompting. A key feature is the Renderscript Editor, which visualizes transcriptions and speaker diarization, enabling precise adjustments to text, timings, and voice assignments before rendering.
Quick Start & Requirements
pip install turnvoiceHF_ACCESS_TOKEN env variable). Tested on Python 3.11.4 / Windows 10.pip install torch==2.3.1+cu211 torchaudio==2.3.1+cu211 --index-url https://download.pytorch.org/whl/cu211). CPU-only rendering is not recommended due to high processing times.editor.html), ffmpeg installation guides provided.Highlighted Details
deep-translator.Maintenance & Community
The project is described as an "early alpha / work-in-progress." Community contact is available via Twitter, Reddit, and Email. No specific contributors, sponsorships, or roadmap details are provided.
Licensing & Compatibility
The project is licensed under the Coqui Public Model License 1.0.0. No specific compatibility notes for commercial use or closed-source linking are mentioned.
Limitations & Caveats
As an early-stage project, bugs and CLI API changes are expected. Achieving perfect lip synchronization, especially after translation, may not always be possible. Speaker detection can be unreliable, and the translation feature is experimental, potentially yielding imperfect results. Synthesis may introduce audio artifacts, and the tool may struggle with complex audio mixes (e.g., voice and singing). Specific version incompatibilities between PyTorch and cuDNN are noted.
5 months ago
Inactive