Local-NotebookLM by Goekdeniz-Guelmez

CLI tool for local PDF-to-podcast conversion using LLMs and TTS

Created 1 year ago

774 stars

Top 45.2% on SourcePulse

View on GitHub

1 Expert Loves This Project

Laurent Mazare

Cofounder of Kyutai

Project Summary

This project provides a local, AI-powered tool to transform PDF documents into audio content like podcasts. It's designed for researchers, content creators, and anyone needing to convert textual information into spoken-word formats, offering customization in style, length, and speaker voices.

How It Works

The tool processes PDFs through a four-step pipeline. It begins with text extraction and chunking, followed by transcript generation using LLMs. A third step optimizes the transcript for Text-to-Speech (TTS) conversion, structuring it for natural conversation and specific formats (e.g., interview, podcast). Finally, it generates audio segments using specified TTS models and voices, concatenating them into a final audio file. The system supports various LLM and TTS providers, including local options like Ollama and LMStudio, and cloud services like OpenAI and Groq.

Quick Start & Requirements

Installation: pip install local-notebooklm or clone the repository and run pip install -r requirements.txt. Docker Compose is also available.
Prerequisites: Python 3.12+, local LLM/TTS servers (optional), 8GB+ RAM (16GB+ recommended), 10GB+ disk space.
Documentation: GitHub Repository

Highlighted Details

Supports multiple LLM providers (OpenAI, Groq, LMStudio, Ollama, Azure, Google, Anthropic) and TTS providers.
Offers various output formats (summary, podcast, interview, debate, etc.) and styles (casual, formal, technical).
Includes a Gradio Web UI for user-friendly access and a FastAPI server for programmatic integration.
Handles multi-speaker formats (up to 5 speakers) and customizable voice selection.

Maintenance & Community

Developed by Gökdeniz Gülmez.
Citable via provided BibTeX entry.

Licensing & Compatibility

The repository does not explicitly state a license in the README.

Limitations & Caveats

The README does not specify a license, which may impact commercial use or integration into closed-source projects. Language support is dependent on the chosen LLM and TTS models, requiring user verification for non-English content. PDF extraction may fail on image-based or password-protected documents.

Health Check

Last Commit

1 month ago

Responsiveness

Inactive

Pull Requests (30d)

Issues (30d)

Star History

122 stars in the last 30 days