CLI tool for local PDF-to-podcast conversion using LLMs and TTS
Top 82.2% on sourcepulse
This project provides a local, AI-powered tool to transform PDF documents into audio content like podcasts. It's designed for researchers, content creators, and anyone needing to convert textual information into spoken-word formats, offering customization in style, length, and speaker voices.
How It Works
The tool processes PDFs through a four-step pipeline. It begins with text extraction and chunking, followed by transcript generation using LLMs. A third step optimizes the transcript for Text-to-Speech (TTS) conversion, structuring it for natural conversation and specific formats (e.g., interview, podcast). Finally, it generates audio segments using specified TTS models and voices, concatenating them into a final audio file. The system supports various LLM and TTS providers, including local options like Ollama and LMStudio, and cloud services like OpenAI and Groq.
Quick Start & Requirements
pip install local-notebooklm
or clone the repository and run pip install -r requirements.txt
. Docker Compose is also available.Highlighted Details
Maintenance & Community
Licensing & Compatibility
Limitations & Caveats
The README does not specify a license, which may impact commercial use or integration into closed-source projects. Language support is dependent on the chosen LLM and TTS models, requiring user verification for non-English content. PDF extraction may fail on image-based or password-protected documents.
3 months ago
1 day