subgen by McCloudS

Subtitle generator for media servers

Created 3 years ago

1,228 stars

Top 31.7% on SourcePulse

Project Summary

This project provides an automated subtitle generation service for media files, leveraging OpenAI's Whisper model. It integrates with popular media servers like Jellyfin, Plex, and Emby, as well as Bazarr, to automatically create SRT or LRC subtitle files from audio or video content. The primary benefit is ensuring all media has subtitles, catering to users who require them for accessibility or preference.

How It Works

The service operates via webhooks triggered by media server events (e.g., new media added, playback started) or through direct integration with Bazarr. It utilizes faster-whisper and stable-ts for transcription, offering flexibility in model selection (from tiny to large-v3-turbo) and compute device (CPU or GPU via CUDA). The system can transcribe audio in its original language or translate it to English, with extensive configuration options for language detection, skipping existing subtitles, and preferred audio tracks.

Quick Start & Requirements

Installation: Docker image mccloud/subgen:latest or mccloud/subgen:cpu.
Prerequisites: Python 3.9-3.11 (for standalone), FFmpeg, NVIDIA drivers with CUDA Toolkit >= 12.2.2 for GPU acceleration.
Setup: Mount media volumes identically to your media server. Refer to the official documentation for detailed setup instructions and webhook configurations.

Highlighted Details

Supports a wide range of OpenAI Whisper models, including large-v3-turbo and distil variants.
Offers both transcription and translation capabilities, with granular control over audio track selection and language preferences.
Integrates with Jellyfin, Plex, Emby, and Tautulli via webhooks, and can function as a Whisper provider for Bazarr.
Includes features like MONITOR for folder watching, LRC_FOR_AUDIO_FILES for audio-specific formats, and advanced subtitle regrouping.

Maintenance & Community

The project is actively maintained, with frequent updates addressing bugs and adding features. Community support and discussions are available via GitHub Discussions.

Licensing & Compatibility

The project is released under an unspecified license. Compatibility for commercial use or closed-source linking is not explicitly stated.

Limitations & Caveats

The project is developed by an individual without formal deployment experience, and the accuracy of transcriptions is dependent on the AI model's performance. Some features, like the web UI, have been removed in favor of environment variable configuration.

Health Check

Last Commit

1 week ago

Responsiveness

1 day

Pull Requests (30d)

Issues (30d)

Star History

44 stars in the last 30 days