Modelscope_Faster_Whisper_Multi_Subtitle by v3ucn

Subtitle generator for offline bilingual transcription

Created 1 year ago

409 stars

Top 71.3% on SourcePulse

Project Summary

This project provides a one-click solution for generating bilingual subtitles using Faster-Whisper and ModelScope, targeting users who need to create dual-language subtitles from audio/video files. It leverages offline large models for translation, offering a convenient and potentially faster alternative to cloud-based services.

How It Works

The system integrates Faster-Whisper for accurate speech-to-text transcription and ModelScope, an open-source platform for large models, for translation. This combination allows for offline processing, reducing reliance on external APIs and potentially improving privacy and speed. The workflow likely involves transcribing audio with Faster-Whisper and then translating the transcribed text using a ModelScope translation model.

Quick Start & Requirements

Installation: Create a Conda environment (conda create -n venv python=3.9, conda activate venv) and install dependencies (pip install -r requirements.txt, pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118).
Prerequisites: FFmpeg (install via Conda, apt, brew, or winget), python3.9-distutils, libsox-dev (Ubuntu/Debian). Requires downloading the whisper-large-v3-turbo model. Ollama is used for conversational translation models (e.g., ollama run qwen2:7b).
Usage: Run the application with python3 app.py.
Supported Languages: Currently supports Chinese-English, English-Chinese, Japanese-Chinese, and Korean-Chinese bilingual subtitles.

Highlighted Details

Leverages Faster-Whisper for efficient and accurate transcription.
Utilizes ModelScope for offline large model-based translation.
Supports multiple bilingual subtitle combinations (e.g., Chinese-English, English-Chinese).
Includes instructions for setting up Ollama for conversational translation.

Maintenance & Community

The project credits faster-whisper and Csanmt. Further community or maintenance details are not provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The use of Faster-Whisper and ModelScope implies adherence to their respective licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as supporting specific bilingual subtitle types, indicating that other language pairs may not be supported. The "off-line large model" aspect suggests a significant local resource requirement for the models.

Modelscope_Faster_Whisper_Multi_Subtitle by v3ucn

Explore Similar Projects

subaligner by baxtree

HY-MT by Tencent-Hunyuan

openlrc by zh-plus

pytvzhen by CuSO4Gem

babelfish.ai by supabase-community

subtitle-translator by gnehs

PotPlayer_ollama_Translate by yxyxyz6

TranslationPlugin by YiiGuxing

MeloTTS by myshell-ai

seamless_communication by facebookresearch

pyvideotrans by jianchang512

fish-speech by fishaudio