pdf-to-podcast by NVIDIA-AI-Blueprints

AI blueprint for converting PDFs into podcast-style audio

Created 1 year ago

786 stars

Top 44.7% on SourcePulse

Project Summary

This NVIDIA AI Blueprint transforms PDFs into AI-generated podcasts, targeting developers and researchers who need to create engaging audio content from documents. It leverages NVIDIA NIM microservices for flexible, private deployment, enabling on-demand audio insights without data sharing.

How It Works

The system ingests a primary PDF and optional context PDFs, using a guide prompt to focus the AI's output. It employs Docling for document parsing, NVIDIA NIM microservices (e.g., Llama 3.1 models) for response generation, and ElevenLabs for text-to-speech. This modular, microservices-based architecture allows for customization, such as swapping LLMs or disabling GPU usage for specific components to manage resource requirements.

Quick Start & Requirements

Install: Clone the repository, set environment variables (ELEVENLABS_API_KEY, NVIDIA_API_KEY), and run make uv to install dependencies.
Prerequisites: Docker Engine & Compose (v2.29.1+), NVIDIA Container Toolkit (for GPU), git, Ubuntu 20.04/22.04, NVIDIA AI Enterprise developer license for local NIM hosting, API keys for NVIDIA NIM and ElevenLabs.
Run: Execute make all-services to start all microservices. Generate podcasts with python tests/test.py --target <pdf1.pdf> [--context <pdf2.pdf>] [--monologue].
Docs: Swagger UI available at localhost:8002/docs.

Highlighted Details

Utilizes NVIDIA NIM microservices for LLM and TTS, offering flexibility in model choice (e.g., Llama 3.1 variants).
Supports private network deployment for secure handling of sensitive data.
Allows customization of components and GPU assignments to optimize resource usage.
Includes tracing capabilities via Jaeger for debugging.

Maintenance & Community

The project is an NVIDIA AI Blueprint, indicating official support and development from NVIDIA. Contributions are managed via standard GitHub pull requests.

Licensing & Compatibility

The README does not explicitly state a license. Compatibility for commercial use or closed-source linking is not specified.

Limitations & Caveats

The provided setup uses HTTP and is not intended for production; SSL/TLS encryption and security headers are recommended for production deployments. Initial service startup, particularly Docling, can take 10-15 minutes.

pdf-to-podcast by NVIDIA-AI-Blueprints

Explore Similar Projects

yt-transcriber by pmarreck

yt2doc by shun-liang

smol-podcaster by FanaHOVA

PodCastLM by YOYZHANG

ArxivPapers by imelnyk

whispo by egoist

Local-NotebookLM by Goekdeniz-Guelmez

tldw by the-crypt-keeper

Twocast by panyanyany

PDF2Audio by lamm-mit

pdf-to-podcast by knowsuchagency

podcastfy by souzatharsis