violin  by shang-zhu

AI-powered video translation and dubbing

Created 1 month ago
777 stars

Top 44.4% on SourcePulse

GitHubView on GitHub
Project Summary

Open-source Video Translation Skill

Violin is an open-source video translation skill that automates transcription, translation, and dubbing into 33 languages. It targets engineers and power users seeking efficient video localization, delivering synchronized, native-sounding voice-overs and optional subtitles, significantly reducing manual effort.

How It Works

The core pipeline leverages ffmpeg for audio extraction, Whisper Large v3 for transcription and timestamping, and a configurable LLM (defaulting to DeepSeek V4 Pro) for segment translation. Text-to-speech synthesis is handled by providers like Cartesia Sonic 3 or ElevenLabs. ffmpeg then re-syncs the video with the synthesized audio, optionally generating SRT subtitles. Its pluggable architecture allows interchangeable providers for each stage.

Quick Start & Requirements

  • Install: uv tool install violin (recommended) or pip install violin.
  • Prerequisites: Python 3.10+, ffmpeg on PATH. API keys for chosen providers (e.g., Together AI, OpenAI, ElevenLabs) are necessary.
  • Demo: Live demo at https://www.violin-ai.com.
  • Docs: Blog post linked from the project page.

Highlighted Details

  • Supports 33 target languages with native-speaker TTS voices for the 16 most common.
  • Features in-video Q&A for interactive content exploration.
  • Offers natural-language voice selection and 6 experimental style profiles (e.g., kids, academic, news) for nuanced translation and TTS.
  • Pluggable stack supports interchangeable transcription, translation, and TTS providers (Together, OpenAI, ElevenLabs).

Maintenance & Community

This is a personal open-source project. Contributions via PRs are welcome. For questions or bug reports, contact heyviolinai@gmail.com. No dedicated community channels (Discord/Slack) are listed.

Licensing & Compatibility

  • License: MIT License.
  • Compatibility: Permissive for commercial use and integration into closed-source applications. Users are responsible for content rights.

Limitations & Caveats

Style profiles are experimental. The project is a personal endeavor; users must ensure they have rights to translate content, as it's intended for Creative Commons, public domain, or self-owned recordings.

Health Check
Last Commit

5 days ago

Responsiveness

Inactive

Pull Requests (30d)
22
Issues (30d)
0
Star History
781 stars in the last 30 days

Explore Similar Projects

Feedback? Help us improve.