Subtitle tool for video transcription, translation, and editing using LLMs
Top 5.6% on sourcepulse
VideoCaptioner is an LLM-powered tool for comprehensive video subtitling, designed for content creators and video editors. It automates speech-to-text transcription, intelligent sentence segmentation, correction, and translation, streamlining the entire subtitling workflow for enhanced video accessibility and engagement.
How It Works
The tool leverages Large Language Models (LLMs) for advanced subtitle processing, including natural sentence segmentation and context-aware correction and translation. It supports both online and local (GPU-accelerated) speech recognition via Whisper variants, offering flexibility and privacy. The "translate-reflect-translate" methodology is employed for high-quality translations.
Quick Start & Requirements
pip install -r requirements.txt
), and run python main.py
. Requires ffmpeg
, aria2
, and Python 3.ffmpeg
, aria2
(macOS), Python 3.x. Local Whisper requires downloading models (e.g., medium
for Chinese, small
for English). LLM API keys are needed for advanced features.Highlighted Details
Maintenance & Community
The project is maintained by a university student, with active development indicated by frequent updates. Community contributions are encouraged via Issues and Pull Requests. Links to documentation and potential community channels are available in the README.
Licensing & Compatibility
The repository's license is not explicitly stated in the provided README text. Compatibility for commercial use would depend on the specific license chosen by the author.
Limitations & Caveats
The macOS version lacks pre-built executables and local Whisper support. Docker deployment is noted as beta and may require updates. The project relies on external LLM APIs, which may incur costs and are subject to provider stability.
6 days ago
1 day