TurnVoice by KoljaB

Video voice transformation and translation CLI

Created 2 years ago

260 stars

Top 97.6% on SourcePulse

Project Summary

This command-line tool addresses the need for advanced voice transformation and translation in video content. It targets engineers, researchers, and power users seeking to modify audio tracks in videos, offering capabilities like voice cloning, multilingual translation, and stylistic voice modulation. The primary benefit is enabling cost-effective, high-control audio manipulation for video production and accessibility.

How It Works

TurnVoice processes local video files or YouTube URLs, leveraging Coqui TTS for free voice transformation and cloning, while also supporting commercial TTS engines like Elevenlabs, OpenAI, and Azure for expanded voice options. It integrates deep-translator for zero-cost video translation and allows users to apply custom speaking styles via prompting. A key feature is the Renderscript Editor, which visualizes transcriptions and speaker diarization, enabling precise adjustments to text, timings, and voice assignments before rendering.

Quick Start & Requirements

Installation: pip install turnvoice
Prerequisites: Nvidia GPU (>8 GB VRAM recommended), NVIDIA CUDA Toolkit 12.1, NVIDIA cuDNN (v9.5.0 tested), Rubberband command-line utility, ffmpeg command-line utility, Huggingface access token (HF_ACCESS_TOKEN env variable). Tested on Python 3.11.4 / Windows 10.
Setup: GPU acceleration requires specific PyTorch installation (pip install torch==2.3.1+cu211 torchaudio==2.3.1+cu211 --index-url https://download.pytorch.org/whl/cu211). CPU-only rendering is not recommended due to high processing times.
Links: Renderscript Editor (editor.html), ffmpeg installation guides provided.

Highlighted Details

Zero-cost voice transformation and cloning via Coqui TTS.
Zero-cost video translation using deep-translator.
AI-powered voice style modulation through prompting.
Precise rendering control with a visual script editor.
Supports local video file processing and preserves background audio.

Maintenance & Community

The project is described as an "early alpha / work-in-progress." Community contact is available via Twitter, Reddit, and Email. No specific contributors, sponsorships, or roadmap details are provided.

Licensing & Compatibility

The project is licensed under the Coqui Public Model License 1.0.0. No specific compatibility notes for commercial use or closed-source linking are mentioned.

Limitations & Caveats

As an early-stage project, bugs and CLI API changes are expected. Achieving perfect lip synchronization, especially after translation, may not always be possible. Speaker detection can be unreliable, and the translation feature is experimental, potentially yielding imperfect results. Synthesis may introduce audio artifacts, and the tool may struggle with complex audio mixes (e.g., voice and singing). Specific version incompatibilities between PyTorch and cuDNN are noted.

Health Check

Last Commit

8 months ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

2 stars in the last 30 days