Local-NotebookLM  by Goekdeniz-Guelmez

CLI tool for local PDF-to-podcast conversion using LLMs and TTS

created 7 months ago
340 stars

Top 82.2% on sourcepulse

GitHubView on GitHub
Project Summary

This project provides a local, AI-powered tool to transform PDF documents into audio content like podcasts. It's designed for researchers, content creators, and anyone needing to convert textual information into spoken-word formats, offering customization in style, length, and speaker voices.

How It Works

The tool processes PDFs through a four-step pipeline. It begins with text extraction and chunking, followed by transcript generation using LLMs. A third step optimizes the transcript for Text-to-Speech (TTS) conversion, structuring it for natural conversation and specific formats (e.g., interview, podcast). Finally, it generates audio segments using specified TTS models and voices, concatenating them into a final audio file. The system supports various LLM and TTS providers, including local options like Ollama and LMStudio, and cloud services like OpenAI and Groq.

Quick Start & Requirements

  • Installation: pip install local-notebooklm or clone the repository and run pip install -r requirements.txt. Docker Compose is also available.
  • Prerequisites: Python 3.12+, local LLM/TTS servers (optional), 8GB+ RAM (16GB+ recommended), 10GB+ disk space.
  • Documentation: GitHub Repository

Highlighted Details

  • Supports multiple LLM providers (OpenAI, Groq, LMStudio, Ollama, Azure, Google, Anthropic) and TTS providers.
  • Offers various output formats (summary, podcast, interview, debate, etc.) and styles (casual, formal, technical).
  • Includes a Gradio Web UI for user-friendly access and a FastAPI server for programmatic integration.
  • Handles multi-speaker formats (up to 5 speakers) and customizable voice selection.

Maintenance & Community

  • Developed by Gökdeniz Gülmez.
  • Citable via provided BibTeX entry.

Licensing & Compatibility

  • The repository does not explicitly state a license in the README.

Limitations & Caveats

The README does not specify a license, which may impact commercial use or integration into closed-source projects. Language support is dependent on the chosen LLM and TTS models, requiring user verification for non-English content. PDF extraction may fail on image-based or password-protected documents.

Health Check
Last commit

3 months ago

Responsiveness

1 day

Pull Requests (30d)
0
Issues (30d)
1
Star History
130 stars in the last 90 days

Explore Similar Projects

Feedback? Help us improve.