Pandrator  by lukaszliniewicz

GUI framework for audiobook, subtitle, and dubbing generation

created 1 year ago
483 stars

Top 64.4% on sourcepulse

GitHubView on GitHub
Project Summary

Pandrator is a free, GUI-driven application for transforming text-based documents (PDF, EPUB) and video files into audiobooks, subtitles, and dubbed videos. It targets users who want to leverage local AI models for text-to-speech, voice cloning, and translation without complex setup, offering a user-friendly interface and all-in-one packages.

How It Works

Pandrator acts as a framework orchestrating various open-source AI tools. For audiobooks, it processes text from PDFs, EPUBs, or plain text, splitting it into manageable segments for TTS engines like XTTS or Silero. It supports voice cloning via XTTS, enhanced by RVC, and allows LLM-based text preprocessing for naturalness. For dubbing, it transcribes video audio using WhisperX, translates subtitles via various APIs or local LLMs, and synthesizes new audio, finally synchronizing it with the video.

Quick Start & Requirements

  • Installation: Precompiled archives are available for download, requiring unpacking. A Windows GUI installer and launcher are also provided.
  • Prerequisites: Requires a reasonably modern CPU (4+ cores recommended for XTTS), an NVIDIA GPU with 4GB+ VRAM for good XTTS performance, FFmpeg, and Python 3.10+. Optional components like RVC, WhisperX, and XTTS fine-tuning require specific CUDA versions (e.g., CUDA 11.8 for XTTS).
  • Setup Time: Initial setup can take from a few minutes to 30 minutes depending on selected components and download speeds.
  • Links:

Highlighted Details

  • Supports XTTS and Silero TTS engines, with XTTS recommended for its multilingual capabilities and voice cloning.
  • Features instant voice cloning with XTTS, enhanced by RVC, and XTTS fine-tuning capabilities.
  • Offers a comprehensive dubbing workflow including transcription (WhisperX), translation (various APIs including local LLMs), and audio synchronization.
  • Includes advanced text preprocessing options, LLM integration for text refinement, and NISQA for audio quality evaluation.

Maintenance & Community

The project is actively developed by a self-identified "noob" developer seeking contributions and feedback. A Discord server is available for community interaction and support.

Licensing & Compatibility

The project's primary dependencies include open-source libraries. Specific licensing details for each component (e.g., XTTS, WhisperX) should be reviewed, as they may have their own terms of use. The project itself appears to be available under a permissive license, but this is not explicitly stated in the README.

Limitations & Caveats

Pandrator is in alpha stage, with the developer noting the code is not optimized and may lack features or reliability. Manual installation on Linux is required. Antivirus software may flag the Windows installer/launcher. Some advanced features require separate setup of external APIs and models.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
0
Star History
28 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.