VideoCaptioner  by WEIFENG2333

Subtitle tool for video transcription, translation, and editing using LLMs

created 9 months ago
9,181 stars

Top 5.6% on sourcepulse

GitHubView on GitHub
Project Summary

VideoCaptioner is an LLM-powered tool for comprehensive video subtitling, designed for content creators and video editors. It automates speech-to-text transcription, intelligent sentence segmentation, correction, and translation, streamlining the entire subtitling workflow for enhanced video accessibility and engagement.

How It Works

The tool leverages Large Language Models (LLMs) for advanced subtitle processing, including natural sentence segmentation and context-aware correction and translation. It supports both online and local (GPU-accelerated) speech recognition via Whisper variants, offering flexibility and privacy. The "translate-reflect-translate" methodology is employed for high-quality translations.

Quick Start & Requirements

  • Windows: Download the standalone executable (under 60MB, includes dependencies).
  • macOS: Clone the repository, install dependencies (pip install -r requirements.txt), and run python main.py. Requires ffmpeg, aria2, and Python 3.
  • Docker: Build and run the provided Dockerfile.
  • Dependencies: ffmpeg, aria2 (macOS), Python 3.x. Local Whisper requires downloading models (e.g., medium for Chinese, small for English). LLM API keys are needed for advanced features.

Highlighted Details

  • Supports batch processing of videos for subtitles.
  • Offers various subtitle style templates (e.g.,科普, news, anime).
  • Integrates with multiple LLM providers (OpenAI, DeepSeek, Ollama) and translation services (DeepL, Microsoft, Google).
  • Features like VAD filtering, voice separation, and word-level timestamps enhance subtitle accuracy.

Maintenance & Community

The project is maintained by a university student, with active development indicated by frequent updates. Community contributions are encouraged via Issues and Pull Requests. Links to documentation and potential community channels are available in the README.

Licensing & Compatibility

The repository's license is not explicitly stated in the provided README text. Compatibility for commercial use would depend on the specific license chosen by the author.

Limitations & Caveats

The macOS version lacks pre-built executables and local Whisper support. Docker deployment is noted as beta and may require updates. The project relies on external LLM APIs, which may incur costs and are subject to provider stability.

Health Check
Last commit

6 days ago

Responsiveness

1 day

Pull Requests (30d)
5
Issues (30d)
36
Star History
2,757 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.