Modelscope_Faster_Whisper_Multi_Subtitle  by v3ucn

Subtitle generator for offline bilingual transcription

created 1 year ago
391 stars

Top 74.6% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a one-click solution for generating bilingual subtitles using Faster-Whisper and ModelScope, targeting users who need to create dual-language subtitles from audio/video files. It leverages offline large models for translation, offering a convenient and potentially faster alternative to cloud-based services.

How It Works

The system integrates Faster-Whisper for accurate speech-to-text transcription and ModelScope, an open-source platform for large models, for translation. This combination allows for offline processing, reducing reliance on external APIs and potentially improving privacy and speed. The workflow likely involves transcribing audio with Faster-Whisper and then translating the transcribed text using a ModelScope translation model.

Quick Start & Requirements

  • Installation: Create a Conda environment (conda create -n venv python=3.9, conda activate venv) and install dependencies (pip install -r requirements.txt, pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118).
  • Prerequisites: FFmpeg (install via Conda, apt, brew, or winget), python3.9-distutils, libsox-dev (Ubuntu/Debian). Requires downloading the whisper-large-v3-turbo model. Ollama is used for conversational translation models (e.g., ollama run qwen2:7b).
  • Usage: Run the application with python3 app.py.
  • Supported Languages: Currently supports Chinese-English, English-Chinese, Japanese-Chinese, and Korean-Chinese bilingual subtitles.

Highlighted Details

  • Leverages Faster-Whisper for efficient and accurate transcription.
  • Utilizes ModelScope for offline large model-based translation.
  • Supports multiple bilingual subtitle combinations (e.g., Chinese-English, English-Chinese).
  • Includes instructions for setting up Ollama for conversational translation.

Maintenance & Community

The project credits faster-whisper and Csanmt. Further community or maintenance details are not provided in the README.

Licensing & Compatibility

The README does not explicitly state a license. The use of Faster-Whisper and ModelScope implies adherence to their respective licenses. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The project is described as supporting specific bilingual subtitle types, indicating that other language pairs may not be supported. The "off-line large model" aspect suggests a significant local resource requirement for the models.

Health Check
Last commit

8 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
18 stars in the last 90 days

Explore Similar Projects

Starred by Boris Cherny Boris Cherny(Creator of Claude Code; MTS at Anthropic), Andrej Karpathy Andrej Karpathy(Founder of Eureka Labs; Formerly at Tesla, OpenAI; Author of CS 231n), and
19 more.

whisper by openai

0.4%
86k
Speech recognition model for multilingual transcription/translation
created 2 years ago
updated 1 month ago
Feedback? Help us improve.