Modelscope_Faster_Whisper_Multi_Subtitle  by v3ucn

Subtitle generator for offline bilingual transcription

Created 1 year ago
404 stars

Top 71.8% on SourcePulse

GitHubView on GitHub
Project Summary

This project provides a one-click solution for generating bilingual subtitles using Faster-Whisper and ModelScope, targeting users who need to create dual-language subtitles from audio/video files. It leverages offline large models for translation, offering a convenient and potentially faster alternative to cloud-based services.

How It Works

The system integrates Faster-Whisper for accurate speech-to-text transcription and ModelScope, an open-source platform for large models, for translation. This combination allows for offline processing, reducing reliance on external APIs and potentially improving privacy and speed. The workflow likely involves transcribing audio with Faster-Whisper and then translating the transcribed text using a ModelScope translation model.

Quick Start & Requirements

  • Installation: Create a Conda environment (conda create -n venv python=3.9, conda activate venv) and install dependencies (pip install -r requirements.txt, pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118).
  • Prerequisites: FFmpeg (install via Conda, apt, brew, or winget), python3.9-distutils, libsox-dev (Ubuntu/Debian). Requires downloading the whisper-large-v3-turbo model. Ollama is used for conversational translation models (e.g., ollama run qwen2:7b).
  • Usage: Run the application with python3 app.py.
  • Supported Languages: Currently supports Chinese-English, English-Chinese, Japanese-Chinese, and Korean-Chinese bilingual subtitles.

Highlighted Details

  • Leverages Faster-Whisper for efficient and accurate transcription.
  • Utilizes ModelScope for offline large model-based translation.
  • Supports multiple bilingual subtitle combinations (e.g., Chinese-English, English-Chinese).
  • Includes instructions for setting up Ollama for conversational translation.

Maintenance & Community

The project credits faster-whisper and Csanmt. Further community or maintenance details are not provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The use of Faster-Whisper and ModelScope implies adherence to their respective licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as supporting specific bilingual subtitle types, indicating that other language pairs may not be supported. The "off-line large model" aspect suggests a significant local resource requirement for the models.

Health Check
Last Commit

11 months ago

Responsiveness

Inactive

Pull Requests (30d)
0
Issues (30d)
0
Star History
4 stars in the last 30 days

Explore Similar Projects

Starred by Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems").

MeloTTS by myshell-ai

0.3%
7k
Multilingual text-to-speech library
Created 1 year ago
Updated 10 months ago
Starred by Jiaming Song Jiaming Song(Chief Scientist at Luma AI), Chip Huyen Chip Huyen(Author of "AI Engineering", "Designing Machine Learning Systems"), and
2 more.

fish-speech by fishaudio

0.4%
24k
Open-source TTS for multilingual speech synthesis
Created 2 years ago
Updated 23 hours ago
Feedback? Help us improve.