Discover and explore top open-source AI tools and projects—updated daily.
elanmartCyberpunk-style video translation with modern DL stack
Top 31.0% on SourcePulse
This project provides a proof-of-concept for real-time video translation, mimicking the subtitle style of Cyberpunk 2077. It targets users interested in automated dubbing and subtitling for video content, offering a pipeline that detects speakers, transcribes speech, translates it, and overlays subtitles onto the original video.
How It Works
The system integrates multiple pre-trained ML models to achieve its functionality. It uses ffmpeg-python for video and audio processing, Whisper for speech-to-text, NVIDIA NeMo for speaker diarization, DeepL for translation, and RetinaFace with DeepFace for face detection and embedding. Speaker and face IDs are matched using heuristics, and subtitles are generated and overlaid using PIL and OpenCV. The architecture is designed for serverless deployment using Modal and features a Gradio frontend for user interaction.
Quick Start & Requirements
Modal account, HuggingFace token, and DeepL API key. Install with pip install -r requirements-modal.txt and run python cbp_translate/app.py.ffmpeg, libsndfile1, git, build-essential, CUDA/cuDNN. Install dependencies via requirements-local.txt and run CLI commands.ffmpeg, git-lfs for large files.Highlighted Details
Modal for serverless cloud deployment, enabling remote execution with minimal boilerplate.Gradio frontend for an interactive demo experience.Maintenance & Community
The project is maintained by elanmart. Links to community resources like Discord/Slack are not provided in the README.
Licensing & Compatibility
The project's licensing is not explicitly stated in the README. Compatibility for commercial use or closed-source linking is not detailed.
Limitations & Caveats
This is a proof-of-concept with significant limitations: processing is slow (minutes per 30s video), it struggles with multiple scenes, speaker/face matching heuristics are basic and can fail, and the pipeline relies on imperfect tools. It has only been tested on a limited set of examples. Font handling for non-Latin characters is not robust.
3 years ago
Inactive